CLVALI.MSG[COM,LSP]7 - www.SailDart.org

perm filename CLVALI.MSG[COM,LSP]7 blob sn#823733 filedate 1986-08-28 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	Introduction
C00006 ENDMK
C⊗;
Introduction
Welcome to the Common Lisp Validation Subgroup.
In order to mail to this group, send to the address:

		CL-Validation@su-ai.arpa

Capitalization is not necessary, and if you are directly on the ARPANET,
you can nickname SU-AI.ARPA as SAIL. An archive of messages is kept on
SAIL in the file:

			   CLVALI.MSG[COM,LSP]

You can read this file or FTP it away without logging in to SAIL.

To communicate with the moderator, send to the address:

		CL-Validation-request@su-ai.arpa

Here is a list of the people who are currently on the mailing list:

Person			Affiliation	Net Address

Richard Greenblatt	LMI		"rg%oz"@mc
Scott Fahlman		CMU		fahlman@cmuc
Eric Schoen		Stanford	schoen@sumex
Gordon Novak		Univ. of Texas	novak@utexas-20
Kent Pitman		MIT		kmp@mc
Dick Gabriel		Stanford/Lucid	rpg@sail
David Wile		ISI		Wile@ISI-VAXA
Martin Griss		HP		griss.hplabs@csnet-relay (I hope)
Walter VanRoggen	DEC		wvanroggen@dec-marlboro
Richard Zippel		MIT		rz@mc
Dan Oldman		Data General	not established
Larry Stabile		Apollo		not established
Bob Kessler		Univ. of Utah	kessler@utah-20
Steve Krueger		TI		krueger.ti-csl@csnet-relay
Carl Hewitt		MIT		hewitt-validation@mc
Alan Snyder		HP		snyder.hplabs@csnet-relay
Jerry Barber		Gold Hill	jerryb@mc
Bob Kerns		Symbolics	rwk@mc
Don Allen		BBN		allen@bbnf
David Moon		Symbolics	moon@scrc-stonybrook
Glenn Burke		MIT		GSB@mc
Tom Bylander		Ohio State	bylander@rutgers
Richard Soley		MIT		Soley@mc
Dan Weinreb		Symbolics	DLW@scrc-stonybrook
Guy Steele		Tartan		steele@tl-20a
Jim Meehan		Cognitive Sys.	meehan@yale
Chris Reisbeck		Yale		riesbeck@yale

The first order of business is for each of us to ask people we know who may
be interested in this subgroup if they would like to be added to this list.

Next, we ought to consider who might wish to be the chairman of this subgroup.
Before this happens, I think we ought to wait until the list is more nearly
complete. For example, there are no representatives of Xerox, and I think we
agree that LOOPS should be studied before we make any decisions.

∂23-Sep-84  1625	RPG  	Introduction  
To:   cl-validation@SU-AI.ARPA   
Welcome to the Common Lisp Validation Subgroup.
In order to mail to this group, send to the address:

		CL-Validation@su-ai.arpa

Capitalization is not necessary, and if you are directly on the ARPANET,
you can nickname SU-AI.ARPA as SAIL. An archive of messages is kept on
SAIL in the file:

			   CLVALI.MSG[COM,LSP]

You can read this file or FTP it away without logging in to SAIL.

To communicate with the moderator, send to the address:

		CL-Validation-request@su-ai.arpa

Here is a list of the people who are currently on the mailing list:

Person			Affiliation	Net Address

Richard Greenblatt	LMI		"rg%oz"@mc
Scott Fahlman		CMU		fahlman@cmuc
Eric Schoen		Stanford	schoen@sumex
Gordon Novak		Univ. of Texas	novak@utexas-20
Kent Pitman		MIT		kmp@mc
Dick Gabriel		Stanford/Lucid	rpg@sail
David Wile		ISI		Wile@ISI-VAXA
Martin Griss		HP		griss.hplabs@csnet-relay (I hope)
Walter VanRoggen	DEC		wvanroggen@dec-marlboro
Richard Zippel		MIT		rz@mc
Dan Oldman		Data General	not established
Larry Stabile		Apollo		not established
Bob Kessler		Univ. of Utah	kessler@utah-20
Steve Krueger		TI		krueger.ti-csl@csnet-relay
Carl Hewitt		MIT		hewitt-validation@mc
Alan Snyder		HP		snyder.hplabs@csnet-relay
Jerry Barber		Gold Hill	jerryb@mc
Bob Kerns		Symbolics	rwk@mc
Don Allen		BBN		allen@bbnf
David Moon		Symbolics	moon@scrc-stonybrook
Glenn Burke		MIT		GSB@mc
Tom Bylander		Ohio State	bylander@rutgers
Richard Soley		MIT		Soley@mc
Dan Weinreb		Symbolics	DLW@scrc-stonybrook
Guy Steele		Tartan		steele@tl-20a
Jim Meehan		Cognitive Sys.	meehan@yale
Chris Reisbeck		Yale		riesbeck@yale

The first order of business is for each of us to ask people we know who may
be interested in this subgroup if they would like to be added to this list.

Next, we ought to consider who might wish to be the chairman of this subgroup.
Before this happens, I think we ought to wait until the list is more nearly
complete. For example, there are no representatives of Xerox, and I think we
agree that LOOPS should be studied before we make any decisions.

∂02-Oct-84  1318	RPG  	Chairman 
To:   cl-validation@SU-AI.ARPA   
Now that we've basically got most everyone who is interested on the mailing
list, let's pick a chairman. I suggest that people volunteer for chairman.

The duties are to keep the discussion going, to gather proposals and review
them, and to otherwise administer the needs of the mailing list. I will
retain the duties of maintaining the list itself and the archives, but
otherwise the chairman will be running the show. 

Any takers?
			-rpg-

∂05-Oct-84  2349	WHOLEY@CMU-CS-C.ARPA 	Chairman     
Received: from CMU-CS-C.ARPA by SU-AI.ARPA with TCP; 5 Oct 84  23:49:33 PDT
Received: ID <WHOLEY@CMU-CS-C.ARPA>; Sat 6 Oct 84 02:49:51-EDT
Date: Sat, 6 Oct 1984  02:49 EDT
Message-ID: <WHOLEY.12053193572.BABYL@CMU-CS-C.ARPA>
Sender: WHOLEY@CMU-CS-C.ARPA
From: Skef Wholey <Wholey@CMU-CS-C.ARPA>
To:   Cl-Validation@SU-AI.ARPA
CC:   Dick Gabriel <RPG@SU-AI.ARPA>
Subject: Chairman 

I'd be willing to chair this mailing list.

I've been very much involved in most aspects of the implementation of Spice
Lisp, from the microcode to the compiler and other parts of the system, like
the stream system, pretty printer, and Defstruct.  A goal of ours is that Spice
Lisp port easily, so most of the system is written in Common Lisp.

Since our code is now being incorporated into many implementations, it's
crucial that it correctly implement Common Lisp.  A problem with our code is
that some of it has existed since before the idea of Common Lisp, and we've
spent many man-months tracking the changes to the Common Lisp specification as
the language evolved.  I am sure we've got bugs because I'm sure we've missed
"little" changes between editions of the manual.

So, I'm interested first in developing code that will aid implementors in
discovering pieces of the manual they may have accidentally missed, and second
in verifying that implementation X is "true Common Lisp."  I expect that the
body of code used for the first purpose will evolve into a real validation
suite as implementors worry about smaller and smaller details.

I've written little validation suites for a few things, and interested parties
can grab those from <Wholey.Slisp> on CMU-CS-C.  Here's what I have right now:

	Valid-Var.Slisp		Checks to see that all variables and constants
				in the CLM are there, and satisfy simple tests
				about what their values should be.

	Valid-Char.Slisp	Exercises the functions in the Characters
				chapter of the CLM.

	Valid-Symbol.Slisp	Exercises the functions in the Symbols chapter
				of the CLM.

Some of the tests in the files may seem silly, but they've uncovered a few bugs
in both Spice Lisp and the Symbolics CLCP.

I think more programs that check things out a chapter (or section) at a time
would be quite valuable, and I'm willing to devote some time to coordinating
such programs into a coherent library.

--Skef

∂13-Oct-84  1451	RPG  	Chairman 
To:   cl-validation@SU-AI.ARPA   

Gary Brown of DEC, Ellen Waldrum of TI, and Skef Wholey of CMU
have volunteered to be chairman of the Validation subgroup. Perhaps
these three people could decide amongst themselves who should be
chairman and let me know by October 24.

			-rpg-

∂27-Oct-84  2159	RPG  	Hello folks   
To:   cl-validation@SU-AI.ARPA   

We now have a chairman of the charter:  Bob Kerns of Symbolics.  I think
he will make an excellent chairman.  For your information I am including
the current members of the mailing list.

I will now let Bob take over responsibility for the discussion.

Dave Matthews		HP		"hpfclp!validation%hplabs"@csnet-relay
Ken Sinclair 		LMI		"khs%mit-oz"@mit-mc
Gary Brown		DEC		Brown@dec-hudson
Ellen Waldrum		TI		WALDRUM.ti-csl@csnet-relay
Skef Wholey		CMU		Wholey@cmuc
John Foderaro		Berkeley	jkf@ucbmike.arpa
Cordell Green		Kestrel		Green@Kestrel
Richard Greenblatt	LMI		"rg%oz"@mc
Richard Fateman		Berekely	fateman@berkeley
Scott Fahlman		CMU		fahlman@cmuc
Eric Schoen		Stanford	schoen@sumex
Gordon Novak		Univ. of Texas	novak@utexas-20
Kent Pitman		MIT		kmp@mc
Dick Gabriel		Stanford/Lucid	rpg@sail
David Wile		ISI		Wile@ISI-VAXA
Martin Griss		HP		griss.hplabs@csnet-relay (I hope)
Walter VanRoggen	DEC		wvanroggen@dec-marlboro
Richard Zippel		MIT		rz@mc
Dan Oldman		Data General	not established
Larry Stabile		Apollo		not established
Bob Kessler		Univ. of Utah	kessler@utah-20
Steve Krueger		TI		krueger.ti-csl@csnet-relay
Carl Hewitt		MIT		hewitt-Validation@mc
Alan Snyder		HP		snyder.hplabs@csnet-relay
Jerry Barber		Gold Hill	jerryb@mc
Bob Kerns		Symbolics	rwk@mc
Don Allen		BBN		allen@bbnf
David Moon		Symbolics	moon@scrc-stonybrook
Glenn Burke		MIT		GSB@mc
Tom Bylander		Ohio State	bylander@rutgers
Richard Soley		MIT		Soley@mc
Dan Weinreb		Symbolics	DLW@scrc-stonybrook
Guy Steele		Tartan		steele@tl-20a
Jim Meehan		Cognitive Sys.	meehan@yale
Chris Reisbeck		Yale		riesbeck@yale

∂27-Oct-84  2202	RPG  	Correction    
To:   cl-validation@SU-AI.ARPA   

The last message about Bob Kerns had a typo in it. He is chairman
of the validation subgroup, not the charter subgroup. Now you
know my secret abot sending out these announcements!
			-rpg-

∂02-Nov-84  1141	brown@DEC-HUDSON 	First thoughts on validation    
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 2 Nov 84  11:38:53 PST
Date: Fri, 02 Nov 84 14:34:24 EST
From: brown@DEC-HUDSON
Subject: First thoughts on validation
To: cl-validation@su-ai
Cc: brown@dec-hudson

I am Gary Brown and supervise the Lisp Development group at Digital
I haven't seen any mail about validation yet, so this is to get things
started.

I think there are three areas we need to address:

 1) The philosophy of validation - What are we going to validate and
    what are we explicitly not going to check?

 2) The validation process - What kind of mechanism should be used to
    implement the validation suite, to maintain it, to update it and
    actually validate Common Lisp implementations?

 3) Creation of an initial validation suite - I believe we could disband
    after reporting on the first two areas, but it would be fun if we
    could also create a prototype validation suite.  Plus, we probably
    can't do a good job specifying the process if we haven't experimented.

Here are my initial thoughts about these three areas:

PHILOSOPHY
We need to clearly state what the validation process is meant to 
accomplish and what it is not intended to accomplish.  There are
aspects of a system of interest to users which we cannot validate.
For example, language validation should not be concerned with:
 - The performance/efficiency of the system under test.  There should
   no timing tests built into the validation suite.
 - The robustness of the system.  How it responds to errors, the
   usefulness of error messages should not be a consideration
   in the design of tests.
 - The type of support tools such as debuggers and editors should
   not be tested or reported on.
In general, the validation process should report only on  whether or
not the implementation is a legal common lisp as defined by the
common lisp reference manual.   Any other information derived from
the testing process should not be made public.  The testing process
cannot produce information which can be used by vendors as advertisements
for their implementations or to degrade other implementations.

We need to state how we will test language elements which are ill-defined
in the reference manual.  For example, if the manual states that it
is "an error" to do something, then we cannot write a test for that
situation.  However, if the manual states that an "error is signaled"
then we should verify that. 

There are several functions in the language whose action is implementation
dependent.  I don't see how we can write a test for INSPECT or for
the printed appearance when *PRINT-PRETTY* is on (however, we can
insure that what is printed is still READable).

PROCESS
We need to describe a process  for language validation.  We could
have a very informal process where the test programs are publicly
available and  potential customers acquire and run the tests.  However, 
I think we need, at least initially, a more formal process.

A contract should be written (with ARPA money?) to some third party
software house to produce and maintain the validation programs, to
execute the tests, and to report the results.  I believe the ADA
validation process works something like this:
 - Every six months a "field test" version of the validation suite
   suite is produced (and the previous field test version is made the
   official version).  Interested parties can acquire the programs
   run them and comment back to SofTech.
 - When a implementation wants to validate, it tells some government
   agency, gets the current validation suite, runs it and send all
   the output back.
 - An appointment is then set up and people the validation agency
   come vendor and run all the tests themselves, again bundle up
   the output and take it away.
 - Several weeks later, the success of the testing is announced.

This seems like a reasonable process to me.  We might want to modify
it by:
 - Having the same agency that produced the tests, validate their results.
 - Getting rid of the on site visit requirement;  it's expensive I
   think the vendor needs to include a check for $10,000 to when
   they request validation.  That might be hard for universities
   to justify.

Some other things I think need to set up are:
 - A good channel from the test producers to the language definers 
   for quick clarifications and to improve the manual
 - Formal ways to complain about the contents of test
 - Ways for new tests to be suggested.  Customers are sure to
   find bugs in validated systems, so it would be invaluable if
   they could report these as holes in the test system.

A FIRST CUT
To do a good job defining the validation process, I think we need to
try to produce a prototype test system.  At Digital we have already
expended considerable effort writing tests for VAX LISP and I assume that
everyone else implementing Common Lisp done the same.  Currently, our 
test software is considered proprietary information.  However, I believe
that we would be willing to make it public domain if the other vendors
were willing to do the same. 

If some kind of informal agreement can be made, we should try to specify
the form of the tests, have everyone convert their applicable tests
to this form and then exchange tests.  This will surely generate
a lot of information on how the test system should be put together.

-Gary Brown

∂04-Nov-84  0748	FAHLMAN@CMU-CS-C.ARPA 	Second thoughts on validation   
Received: from CMU-CS-C.ARPA by SU-AI.ARPA with TCP; 4 Nov 84  07:47:00 PST
Received: ID <FAHLMAN@CMU-CS-C.ARPA>; Sun 4 Nov 84 10:47:06-EST
Date: Sun, 4 Nov 1984  10:47 EST
Message-ID: <FAHLMAN.12060893556.BABYL@CMU-CS-C.ARPA>
Sender: FAHLMAN@CMU-CS-C.ARPA
From: "Scott E. Fahlman" <Fahlman@CMU-CS-C.ARPA>
To:   cl-validation@SU-AI.ARPA
Subject: Second thoughts on validation


I agree with all of Gary Brown's comments on the proper scope of
validation.  The only point that may cause difficulty is the business
about verifying that an error is signalled in all the places where this
is specified.  The problem there is that until the Error subgroup does
its thing, we have no portable way to define a Catch-All-Errors handler
so that the valiadtion program can intercept such signals and proceed.
Maybe we had better define such a hook right away and require that any
implementation that wants to be validated has to support this, in
addition to whatever more elegant hierarchical system eventually gets
set up.  The lack of such a unversal ERRSET mechanism is clearly a
design flaw in the language.  We kept putting this off until we could
figure out what the ultimate error handler would look like, and so far
we haven't done that.

As for the process, I think that the validation suite is naturally going
to be structured as a series of files, each of which contains a function
that will test some particular part of the language: a chapter's worth
or maybe just some piece of a chapter such as lambda-list functionality.
That way, people can write little chunks of validation without being
overwhelmend by the total task.  Each such file should have a single
entry point to a master function that runs everything else in the file.
These things should print out an informative message whenever it notices
an implementation error.  They can also print out some other commentary
at the implementor's discretion, but probably there should be a switch
that will muzzle anything other than hard errors.  Finally, there should
be some global switch that starts out as NIL and gets set to T whenever
some module finds a clear error.  If this is still NIL after every
module has done its testing, the implementation is believed to be
correct.  I was going to suggest a counter for this, but then we might
get some sales rep saying that Lisp X has 14 validation errors and our
Lisp only has 8.  That would be bad, since some errors are MUCH more
important than others.

To get the ball rolling, we could begin collecting public-domain
validation modules in some place that is easily accessible by arpanet.
As these appear, we can informally test various implementations against
them to smoke out any inconsistencies or disagreements about the tests.
I would expect that when this starts, we'll suddenly find that we have a
lot of little questions to answer about the language itself, and we'll
have to do our best to resolve those questions quickly.  Once we have
reached a consensus that a test module is correct, we can add it to some
sort of "approved" list, but we should recognize that, initially at
least, the testing module is as likely to be incorrect as the
implementation.

As soon as possible, this process of maintaining and distributing the
validation suite (and filling in any holes that the user community does
not fill voluntarily) should fall to someone with a DARPA contract to do
this.  No formal testing should begin until this organization is in
place and until trademark protection has been obtained for "DARPA
Validated Common Lisp" or whatever we are going to call it.  But a lot
can be done informally in the meantime.

I don't see a lot of need for expensive site visits to do the
validating.  It certainly doesn't have to be a one-shot win-or-lose
process, but can be iterative until all the tests are passed by the same
system, or until the manufacturer decides that it has come as close as
it is going to for the time being.  Some trusted (by DARPA), neutral
outside observer needs to verify that the hardware/software system in
question does in fact run the test without any chicanery, but there are
all sorts of ways of setting that up with minimal bureaucratic hassle.
We should probably not be in the business of officially validating
Common Lisps on machines that are still under wraps and are not actually
for sale, but the manufacturers (or potential big customers) could
certainly run the tests for themselves on top-secret prototypes and be
ready for official validation as soon as the machine is released to the
public.

I'm not sure how to break the deadlock in which no manufacturer wants to
be the first to throw his proprietary validation software into the pot.
Maybe this won't be a problem, if one of the less bureaucratic companies
just decides to take the initiative here.  But if there is such a
deadlock, I suppose the way to proceed is first to get a list of what
each company proposes to offer, then to Get agreement from each that it
will donate its code if the others do likewise, then to get some lawyer
(sigh!) to draw up an agreement that all this software will be placed in
the public domain on a certain date if all the other companies have
signed the agreement by that date.  It would be really nice to avoid
this process, however.  I see no advantage at all for a company to have
its own internal validation code, since until that code ahs been
publically scrutinized, there is no guarantee that it would be viewed as
correct by anyone else or that it will match the ultimate standard.

-- Scott

∂07-Nov-84  0852	brown@DEC-HUDSON 	test format 
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 7 Nov 84  08:43:57 PST
Date: Wed, 07 Nov 84 11:40:37 EST
From: brown@DEC-HUDSON
Subject: test format
To: cl-validation@su-ai

First, I would hope that submission of test software will not require
any lawyers.  I view this as a one-time thing, the only purpose of which
is to get some preliminary test software available to all implementations,
and to give this committee some real data on language validation.
The creation and maintenance of the real validation software should be
the business of the third party funded to do this.  I would hope that
they can use what we produce, but that should not be a requirement.

If we are going to generate some preliminary tests, we should develop
a standard format for the tests.   I have attached a condensed and
reorganizied version of the "developers guide" for our test system.
Although I don't think our test system is particularly elegant, it
basically works.  There are a few things I might change someday:

  - The concept of test ATTRIBUTES is not particularly useful.  We
    have never run tests by their attributes but always run a whole
    file full of them.  

  - The expected result is not evaluated (under the assumption that
    if it were, most of the time you would end up quoting it.  That
    is sometimes cumbersome.

  - There is not a builtin way to check multiple value return.  You
    make the test-case do a multiple-value-list and look at the list.
    That is sometimes cumbersome and relatively easy to fix.

  - We haven't automated the analysis of the test results.

  - Our test system is designed to handle lot of little tests and I
    think that it doesn't simplify writing complex tests.  I have
    never really thought about what kind of tools would be useful.

If we want to try to build some tests, I am willing to change our test
system to incorporate any good ideas and make it available.

-Gary


!

     1  A SAMPLE TEST DEFINITION

          Here is the test for GET.

     (def-lisp-test (get-test :attributes (symbols get)
                              :locals (clyde foo))
       "A test of get.  Uses the examples in the text."
       ((fboundp 'get) ==> T)
       ((special-form-p 'get) ==> NIL)
       ((macro-function 'get) ==> NIL)
       ((progn
           (setf (symbol-plist 'foo) '(bar t baz 3 hunoz "Huh?"))
           (get 'foo 'bar))
         ==> T)
       ((get 'foo 'baz) ==> 3)
       ((get 'foo 'hunoz) ==> "Huh?")
       ((prog1
           (get 'foo 'fiddle-sticks)
           (setf (symbol-plist 'foo) NIL))
         ==> NIL)
       ((get 'clyde 'species) ==> NIL)
       ((setf (get 'clyde 'species) 'elephant) ==> elephant)
       ((get 'clyde) <error>)
       ((prog1
           (get 'clyde 'species)
           (remprop 'clyde 'species))
         ==> elephant)
       ((get) <error>)
       ((get 2) <error>)
       ((get 4.0 'f) <error>))
     Notice that everything added to the property list is taken off  again,
     so  that  the  test's  second run will also work.  Notice also that it
     isn't wise to start by testing for

             ((get 'foo 'baz)  ==> NIL)

     as someone may have decided to give FOO the property  BAZ  already  in
     another test.



     2  DEFINING LISP TESTS

          Tests are defined with the DEF-LISP-TEST macro.

     DEF-LISP-TEST {name | (name &KEY :ATTRIBUTES :LOCALS)}           [macro]
                   [doc-string] test-cases







                                   - 1 -
!
                                                                Page 2


     3  ARGUMENTS TO DEF-LISP-TEST

     3.1  Name

          NAME is the name of the  test.   Please  use  the  convention  of
     calling  a  test FUNCTION-TEST, where FUNCTION is the name of (one of)
     the function(s) or variable(s) tested by that test.  The  symbol  name
     will  have  the  expanded test code as its function definition and the
     following properties:

           o  TEST-ATTRIBUTES - A list of all the attribute  symbols  which
              have this test on their TEST-LIST property.

           o  TEST-DEFINITION -  The  expanded  test  code.   Normally  the
              function  value  of  the  test is compiled; the value of this
              property is EVALed to run the test interpreted.

           o  TEST-LIST - The list of tests  with  NAME  as  an  attribute.
              This list will contain at least NAME.




     3.2  Attributes

          The value of :ATTRIBUTES is a list of  "test  attributes".   NAME
     will  be  added to this list.  Each symbol on this list will have NAME
     added to the list which is the value of its TEST-LIST property.



     3.3  Locals

          Local variables can be specified  and  bound  within  a  test  by
     specifying the :LOCALS keyword followed by a list of the for used in a
     let var-list.  For example, specifying the list (a b c)  causes  a,  b
     and c each to be bound to NIL during the run of the test; the list ((a
     1) (b 2) (c 3)) causes a to be bound to 1, b to 2, and c to  3  during
     the test.



     3.4  Documentation String

          DOC-STRING is a normal documentation string of documentation type
     TESTS.   To  see  the documentation string of a function FOO-TEST, use
     (DOCUMENTATION 'FOO-TEST 'TESTS).   The  documentation  string  should
     include  the  names of all the functions and variables to be tested in
     that test.  Mention if there is anything missing from the  test,  e.g.
     tests of the text's examples.




                                   - 2 -
!
                                                                Page 3


     3.5  Test Cases

          TEST-CASES (the remainder of the body) is a series of test cases.
     Each  test  case  is  a  list of a number of elements as follows.  The
     order specified here must hold.



     3.5.1  Test Body -

          A form to be executed as the test body.  If it  returns  multiple
     values, only the first will be used.



     3.5.2  Failure Option -

          The symbol <FAILURE> can be used to indicate that the  test  case
     is  known  to  cause  an  irrecoverable  error  (e.g.  it goes into an
     infinite loop).  When the test case is run, the code is not  executed,
     but  a  message  is  printed  to  remind you to fix the problem.  This
     should be followed by normal result options.  Omission of this  option
     allows the test case to be run normally.



     3.5.3  Result Options -



     3.5.3.1  Comparison Function And Expected Result -

          The Test Body will be compared with the Expected Result using the
     function EQUAL if you use
             ==> expected-result
     or with the function you specify if you use
             =F=> function expected-result
     There MUST be white-space after ==> and =F=>, as they are  treated  as
     symbols.   Notice  that neither function nor expected-result should be
     quoted.  "Function" must be defined; an explicit lambda form is legal.
     "Expected-Result"  is the result you expect in evaluating "test-body".
     It is not evaluated.  The comparison function will be called  in  this
     format:
             (function test-body 'expected-value)



     3.5.3.2  Errors -

          <ERROR> - The test is expected to signal  an  error.   This  will
     normally  be  used  with  tests which are expected to generate errors.
     This is an alternative  to  the  comparison  functions  listed  above.
     There should not be anything after the symbol <ERROR>.  It checks that

                                   - 3 -
!
                                                                Page 4


     an error is signaled when the test case is run interpreted,  and  that
     an  error  is  signaled  either  during the compilation of the case or
     while the case is being evaluated when the test is run compiled.



     3.5.3.3  Throws -

          =T=> - throw-tag result - The test is expected to  throw  to  the
     specified  tag  and  return  something  EQUAL to the specified result.
     This clause is only required for a small number of tests.  There  must
     be  a  space  after  =T=>,  as  it is treated as a symbol.  This is an
     alternative to the functions given above.  This does not work compiled
     at the moment, due to a compiler bug.



     4  RUNNING LISP TESTS

          The function RUN-TESTS can be called with no arguments to run all
     the  tests,  with  a  symbol which is a test name to run an individual
     test, or with a list of symbols, each of which is an attribute, to run
     all  tests  which have that attribute.  Remember that the test name is
     always added to the attribute list automatically.

          The special variable *SUCCESS-REPORTS* controls whether  anything
     will be printed for successful test runs.  The default value is NIL.

          The special variable *START-REPORTS* controls whether  a  message
     containing  the  test  name  will be printed at the start of each test
     execution.  The default value is NIL.  If *SUCCESS-REPORTS* is T, this
     variable is treated as T also.

          The special variable *RUN-COMPILED-TESTS*  controls  whether  the
     "compiled"  versions  of the specified tests will be run.  The default
     value is T.

          The special variable *RUN-INTERPRETED-TESTS* controls whether the
     "interpreted"  versions  of  the  specified  tests  will  be run.  The
     default value is T.

          The special  variable  *INTERACTIVE*  controls  whether  you  are
     prompted  after  unexpected errors for whether you would like to enter
     debug.   It  uses  yes-or-no-p.   To  continue  running  tests   after
     enterring  debug  after  one  of  these  prompts,  type  CONTINUE.  If
     *INTERACTIVE* is set to T, the test system  will  do  this  prompting.
     The default value is NIL.



     5  GUIDE LINES FOR WRITING TEST CASES

          1.  The first several test cases in each test should be tests for

                                   - 4 -
!
                                                                Page 5


     the  existence  and correct type of each of the functions/variables to
     be    tested    in    that    test.     A    variable,     such     as
     *DEFAULT-PATHNAME-DEFAULTS*, should have tests like these:

             ((boundp '*default-pathname-defaults*) ==> T)
             ((pathnamep *default-pathname-defaults*) ==> T)


          A function, such as OPEN, should have these tests:

             ((fboundp 'open) ==> T)
             ((macro-function 'open) ==> NIL)
             ((special-form-p 'open) ==> NIL)


          A macro, such as WITH-OPEN-FILE, should have these tests:

             ((fboundp 'with-open-file) ==> T)
             ((not (null (macro-function 'with-open-file))) T)

     Note that, as MACRO-FUNCTION returns the function definition (if it is
     a  macro)  or  NIL  (if  it  isn't  a  macro),  we  use NOT of NULL of
     MACRO-FUNCTION here.  Note also that a macro may  also  be  a  special
     form,  so  SPECIAL-FORM-P  is not used:  we don't care what the result
     is.

          A special form, such as SETQ, should have these tests:

             ((fboundp 'setq) ==> T)
             ((not (null (special-form-p 'setq))) T)

     Again, note that SPECIAL-FORM-P returns the function definition (if it
     is  a  special  form)  or  NIL (if it isn't), so we use NOT of NULL of
     SPECIAL-FORM-P here.  Note also that we don't care  if  special  forms
     are also macros, so MACRO-FUNCTION is not used.



          2.  The next tests  should  be  simple  tests  of  each  of  your
     functions.   If  you  start  right  in  with complicated tests, it can
     become difficult to unravel simple bugs.  If possible, create one-line
     tests which only call one of the functions to be tested.

          E.g.  for +:

             ((+ 2 10) ==> 12)




          3.  Test each of the examples given in the Common Lisp Manual.



                                   - 5 -
!
                                                                Page 6


          4.  Then test more complicated cases.  Be sure to test both  with
     and  without each of the optional arguments and keyword arguments.  Be
     sure to test what the manual SAYS, not what you know that we do.



          5.  Then test for obvious cases which  should  signal  an  error.
     Obvious  things  to test are that it signals an error if there are too
     few or too many arguments, or if the argument is of  the  wrong  type.
     E.g.  for +

             ((+ 2 'a) <ERROR>)




     6  HINTS

          Don't try to be  clever.   What  we  need  first  is  a  test  of
     everything.   If  we decide that we need "smarter" tests later, we can
     go back and embellish.  Right now we need to have a  test  that  shows
     whether the functions and variables we are supposed to have are there,
     and that tells whether  at  first  glance  the  function  is  behaving
     properly.  Even with simple tests this test system will be huge.

          Don't write long test cases if you can help it.  Think about  the
     kind  of error messages you might get and how easy it will be to debug
     them.

          Remember that, although the test system guarantees that the  test
     cases  within  one  test are run in the order defined, no guarantee is
     made that your tests will be run  in  the  order  in  which  they  are
     loaded.   Do  not  write  tests which depend on other tests having run
     before them.

          It is now possible to check for cases which should signal errors;
     please do.

          I have found it easiest to compose and  then  debug  tests  which
     have no more than 20 cases.  Once a test works I often add a number of
     cases, however, and I do have some  with  over  100  cases.   However,
     sometimes  tests  with as few as 10 cases can be difficult to unravel,
     if, for example, the test won't compile properly.  Therefore, if there
     is  a  group  of related functions which require many tests each, I am
     more likely to have a separate test for each function.  If testing one
     function   is   made   more   easy   by  also  testing  another  (e.g.
     define-logical-name, translate-logical-name and  delete-logical-name),
     it  can  be advantageous to test them together.  It is not a good idea
     to make the test cases or returned values very large, however.   Also,
     when many functions are tested in the same test, it is likely that the
     tests can get complicated to debug and/or that some aspect of  one  of
     the  functions  tested  could be forgotten.  Therefore, I would prefer
     that you NOT write, say, four or five tests, each of which is supposed

                                   - 6 -
!
                                                                Page 7


     to  test  all  of  the  functions  in one part of the manual.  I would
     prefer that a function have a test which is dedicated to it  (even  if
     it  is  shared with one or two other functions).  This means that some
     functions will be used not just in tests of themselves,  but  also  in
     tests of related functions; but that is ok.

          Remember that each test will be run twice by the test system.  So
     if your test changes something, change it back.



     7  EXAMPLES

     7.1  Comparison Function

          If you use the "( code =F=> comparison-function result )" format,
     the result is now determined by doing (comparison-function code (quote
     result)).

             (2 =F=> < 4)   <=>   (< 2 4)
             (2 =F=> > 4)   <=>   (> 2 4)

     Notice that the new comparison function you introduce is unquoted.

          You may also use an explicit lambda form.  For example,

             (2 =F=> (lambda (x y) (< x y)) 4)   <=>  (< 2 4)




     7.2  Expected Result

          Remember  that  the  returned  value  for  a  test  case  is  not
     evaluated;  so  "==>  elephant" means is it EQUAL to (quote elephant),
     not to the value of elephant.

          Consequently, this is in error:

             (mapcar #'1+ (list 0 1 2 3)) ==> (list 1 2 3 4))

     and this is correct:

             (mapcar #'1+ (list 0 1 2 3)) ==> (1 2 3 4))


                          *Tests Return Single Values*

             A test returns exactly one value; a test of a function
             which returns multiple values must be written as:

                     (MULTIPLE-VALUE-LIST form)


                                   - 7 -
!
                                                                Page 8


                             *Testing Side Effects*

             A test of a side effecting function must  verify  that
             the  function  both  returns  the  correct  value  and
             correctly causes the side effect.  The following  form
             is an example of a body that does this:

                 ((LET (FOO) (LIST (SETF FOO '(A B C)) FOO)))
                     ==> ((A B C) (A B C)))





     7.3  Throw Tags

          The throw tag is also not evaluated.

          You must have either "==> <result>" or "=F=>  comparison-function
     <result>" or "=T=> throw-tag <result>" or "<ERROR>" in each test case.
     Remember that you may no longer use <-T- or <-S-.  For  example,  this
     would be correct:

             ((catch 'samson
                  (throw 'delilah 'scissors))
               =T=> delilah scissors)

     This test case would cause an unexpected error:

             ((catch 'samson
                     (throw 'delilah 'scissors))
               ==> scissors)




     7.4  Expected Failures

          Any test case can have the <FAILURE> option inserted to  indicate
     that  the  code  should not be run.  For example, these test cases are
     innocuous:
             ((dotimes (count 15 7)
                  (setf count (1- count)))
               <failure> ==> 7)

             ((dotimes (count 15 7)
                  (setf count (1- count)))
               <failure> =F=> <= 7)

             ((throw 'samson (dotimes (count 15 7)
                                 (setf count (1- count))))
               <failure> =T=> samson 7)


                                   - 8 -
!
                                                                Page 9


             ((car (dotimes (count 15 7)
                       (setf count (1- count))))
               <failure> <error>)
     Obviously, you are not expected to introduce infinite loops  into  the
     test cases deliberately.



     7.5  Sample Error And Success Reports

          A test with cases which all succeed will run with  no  output  if
     *SUCCESS-REPORTS*  is  NIL;  if  it is set to T, output will look like
     this:
     ************************************************************************ 
     Starting: GET-TEST 
     A test of get.  Uses the examples in the text.

     TESTS:GET-TEST succeeded in compiled cases
      1 2 3 4 5 6 7 8 9 10 11 12 13 14

     TESTS:GET-TEST succeeded in interpreted cases
      1 2 3 4 5 6 7 8 9 10 11 12 13 14


          If a test case evaluates properly but returns the wrong value, an
     error   report   will   be   made   irrespective  of  the  setting  of
     *SUCCESS-REPORTS*.  The  reports  include  the  test  case  code,  the
     expected  result, the comparison function used, and the actual result.
     For example, if you run this test:

             (def-lisp-test (+-test :attributes (numbers +))
               ((+) ==> 0)
               ((+ 2 3) ==> 4)
               ((+ -4 -5) =F=> >= 0))

     The second and third cases are wrong, so there  will  be  bug  reports
     like this:
     ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
     TESTS:+-TEST 
     Error in compiled case 2.

     Expected: (+ 2 3)

     to be EQUAL to: 4

     Received: 5
     -----------------------------------------------------------------------

     ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
     TESTS:+-TEST
     Error in compiled case 3.

     Expected: (+ -4 -5)

                                   - 9 -
!
                                                               Page 10


     to be >= to: 0

     Received: -9
     ------------------------------------------------------------------------

          Unexpected errors cause a report which includes  the  code  which
     caused  the  error,  the expected result, the error condition, and the
     error message from the error system.  As with other errors, these bugs
     are  reported  regardless  of  the  setting of *SUCCESS-REPORTS*.  For
     example:

             (def-lisp-test (=-test :attributes (numbers =))
               ((fboundp '=) ==> T)
               ((macro-function '=) ==> NIL)
               ((special-form-p '=) ==> NIL))

     The following report is given if MACRO-FUNCTION is undefined:

     ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←← 
     TESTS:=-TEST compiled case 2 caused an unexpected 
      correctable error in function *EVAL. 

     Expected: (MACRO-FUNCTION '=) 

     to be EQUAL to: NIL 

      The error message is: 
     Undefined function: MACRO-FUNCTION.

     -----------------------------------------------------------------------




     8  RUNNING INDIVIDUAL TEST CASES

          The interpreted version of a test case can be  run  individually.
     Remember that if any variables are used which are modified in previous
     test cases, the results will not be "correct"; for example, any  local
     variables bound for the test with the :LOCALS keyword are not bound if
     a test case is run with this function.  The format is
        (RUN-TEST-CASE test-name test-case)
     Test-name is a symbol; test-case is an integer.



     9  PRINTING TEST CASES

          There are some new functions:

        (PPRINT-TEST-DEFINITION name)
        (PPRINT-TEST-CASE name case-number)
        (PPRINT-ENTIRE-TEST-CASE name case-number)

                                  - 10 -
!
                                                               Page 11


        (PPRINT-EXPECTED-RESULT name case-number)


          In each case, name is a  symbol.   In  the  latter  three  cases,
     case-number is a positive integer.

          PPRINT-TEST-DEFINITION pretty prints the expanded test code for a
     test.

          PPRINT-TEST-CASE pretty prints the test code for the  body  of  a
     test case; i.e.  the s-expression on the left of the arrow.

          PPRINT-ENTIRE-TEST-CASE pretty prints the  entire  expanded  test
     code   for   the  case  in  question,  i.e.   rather  more  than  does
     PPRINT-TEST-CASE and rather less than PPRINT-TEST.

          PPRINT-EXPECTED-RESULT pretty prints the expected result for  the
     test case specified.  This cannot be done for a case which is expected
     to signal an error, as in that case there is no comparison of expected
     and actual result.


































                                  - 11 -

∂09-Nov-84  0246	RWK@SCRC-STONY-BROOK.ARPA 	Hello   
Received: from SCRC-STONY-BROOK.ARPA by SU-AI.ARPA with TCP; 9 Nov 84  02:46:18 PST
Received: from SCRC-HUDSON by SCRC-STONY-BROOK via CHAOS with CHAOS-MAIL id 123755; Thu 8-Nov-84 21:32:33-EST
Date: Thu, 8 Nov 84 21:33 EST
From: "Robert W. Kerns" <RWK@SCRC-STONY-BROOK.ARPA>
Subject: Hello
To: cl-validation@SU-AI.ARPA
Message-ID: <841108213326.0.RWK@HUDSON.SCRC.Symbolics.COM>

Hello.  Welcome to the Common Lisp Validation committee.  Let me
introduce myself, general terms, first.

I am currently the manager of Lisp System Software at Symbolics,
giving me responsibility for overseeing our Common Lisp effort,
among other things.  Before I became a manager, I was a developer
at Symbolics.  In the past I've worked on Macsyma, MacLisp and NIL
at MIT, and I've worked on object-oriented systems on these systems.

At Symbolics, we are currently preparing our initial Common Lisp
offering for release.  Symbolics has been a strong supporter of Common
Lisp in its formative years, and I strongly believe that needs to
continue.  Why do I mention this?  Because I think one form of support
is to contribute our validation tests as we collect and organize them.

I urge other companies to do likewise.  I believe we all have
far more to gain than to lose.  I believe there will be far more
validation code available in the aggregate than any one company
will have available by itself.  In addition, validation tests from
other places have the advantage of bringing a fresh perspective
to your testing.  It is all too easy to test for the things you
know you made work, and far too difficult to test for the more
obscure cases.

As chairman, I see my job as twofold:

1)  Facilitate communication, cooperation, and decisions.
2)  Facilitate the implementation of decisions of the group.

Here's an agenda I've put together of things I think we
need to discuss.  What items am I missing?  This nothing
more than my own personal agenda to start people thinking.

First, the development issues:

1)  Identify what tests are available.  So far, I know of
the contribution by Skef Wholey.  I imagine there will be
others forthcoming once people get a chance to get them
organized.  (Myself included).

2)  Identify a central location to keep the files.  We
need someone on the Arpanet to volunteer some space for
files of tests, written proposals, etc.  Symbolics is
not on the main Arpanet currently, so we aren't a good
choice.  Volunteers?

    Is there anyone who cannot get to files stored on
the Arpanet?  If so, please contact me, and I'll arrange
to get files to you via some other medium.

3)  We need to consider the review process for proposed
tests.  How do we get tests reviewed by other contributors?
We can do it by FTPing the files to the central repository
and broadcasting a request to evaluating it to the list.
Would people prefer some less public form of initial evaluation?

4)  Test implementation tools.  We have one message from Gary Brown
describing his tool.  I have a tool written using flavors that I
hope to de-flavorize and propose.  I think we would do well to standardize
in this area as much as possible.

5)  Testing techniques.  Again, Gary Brown has made a number of excellent
suggestions here.  I'm sure we'll all be developing experience that we
can share.

6)  What areas do we need more tests on?

And there are a number of political, proceedural, and policy issues that
need to be resolved.

7)  Trademark/copyright issues.  At Monterey, DARPA voluntered to
investigate trademarking and copyrighting the validation suite.
RPG: have you heard anything on this?

8)  How do we handle disagreements about the language?  This was
discussed at the Monterey meeting, and I believe the answer, if
we can't work it out, we ask the Common Lisp mailing list, and
especially the Gang of Five, for a clarification.  At any rate,
I don't believe it is in our charter to resolve language issues.
I expect we will IDENTIFY a lot of issues, however.

I don't think the rest of these need to be decided any time soon.
We can discuss them now, or we can wait.

9)  How does a company (or University) get a Common Lisp implementation
validated, and what does it mean?  We can discuss this now, but I
don't think we have to decide it until we produce our first validation
suite.

10) How do we distribute the validation suites?  I hope we can do most
of this via the network.  I am willing to handle distributing it to
people off the network until it gets too expensive in time or tapes.
We will need a longer-term solution to this, however.

11) Longer term maintenance of the test suites.  I think having a
commercial entity maintain it doesn't make sense until we get the
language into a more static situation.  I don't think there is
even agreement that this is the way it should work, for that
matter, but we have plenty of time to discuss this, and the situation
will be changing in the meantime.

So keep those cards and letters coming, folks!

∂12-Nov-84  1128	brown@DEC-HUDSON 	validation process    
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 12 Nov 84  11:25:11 PST
Date: Mon, 12 Nov 84 14:26:14 EST
From: brown@DEC-HUDSON
Subject: validation process
To: cl-validation@su-ai

I am happy to see that another vendor (Symbolics) is interested in sharing
tests.  I too believe we all much to gain by this kind of cooperation.

Since it seems that we will be creating and running tests, I would like
to expand a bit on an issue I raised previously - the ethics of validation.
A lot of information; either explicit or intuitive, concerning the quality
of the various implementations will surely be passed around on this mailing
list.  I believe that this information must be treated confidentially.  I
know of two recent instances when perceived bugs in our implementation of
Common Lisp were brought up in sales situations.  I can not actively
participate in these discussions unless we all intend to keep this
information private.

I disagree with the last point Bob's "Hello" mail - the long term maintenance
of the test suite (however, I agree that we have time to work this out).
I believe that our recommendation should be that ARPA immediately fund a
third party to create/maintain/administer language validation.

One big reason is to guarantee impartiality and to protect ourselves.
If Common Lisp validation becomes a requirement for software on RFPs,
big bucks might be a stake and we need guarantee that the process is
impartial and, I think, we want a lot of distance between ourselves and
the validation process.  I don't want to get sued by XYZ inc. because their
implementation didn't pass and this caused them to lose a contract and go
out of business.

Of course, if ARPA isn't willing to fund this, then we Common Lispers will
have to do something ourselves.  It would be useful if we could get
some preliminary indication from ARPA about their willingness to fund
this type effort.

∂12-Nov-84  1237	FAHLMAN@CMU-CS-C.ARPA 	validation process    
Received: from CMU-CS-C.ARPA by SU-AI.ARPA with TCP; 12 Nov 84  12:36:09 PST
Received: ID <FAHLMAN@CMU-CS-C.ARPA>; Mon 12 Nov 84 15:35:13-EST
Date: Mon, 12 Nov 1984  15:35 EST
Message-ID: <FAHLMAN.12063043155.BABYL@CMU-CS-C.ARPA>
Sender: FAHLMAN@CMU-CS-C.ARPA
From: "Scott E. Fahlman" <Fahlman@CMU-CS-C.ARPA>
To:   brown@DEC-HUDSON.ARPA
Cc:   cl-validation@SU-AI.ARPA
Subject: validation process
In-reply-to: Msg of 12 Nov 1984  14:26-EST from brown at DEC-HUDSON


I don't see how confidentiality of validation results can be maintained
when the validation suites are publically available (as they must be).
If DEC has 100 copies of its current Common Lisp release out in
customer-land, and if the validation programs are generally available to
users and manufacturers alike, how can anyone reasonably expect that
users will not find out that this release fails test number 37?  I think
that any other manufacturer had better be without sin before casting the
first stone in a sales presentation, but certainly there will be some
discussion of which implementations are fairly close and which are not.
As with benchmarks, it will take some education before the public can
properly interpret the results of such tests, and not treat the lack of
some :FROM-END option as a sin of equal magnitude to the lack of a
package system.

The only alternative that I can see is to keep the validation suite
confidential in some way, available only to manufacturers who promise to
run it on their own systems only.  I would oppose that, even if it means
that some manufacturers would refrain from contributing any tests that
their own systems would find embarassing.  It seems to me that making
the validation tests widely available is the only way to make them
widely useful as a standardization tool and as something that can be
pointed at when a contract wants to specify Common Lisp.  Of course, it
would be possible to make beta-test users agree not to release any
validation results, just as they are not supposed to release benchmarks.

I agree with Gary that we probably DO want some organization to be the
official maintainer of the validation stuff, and that this must occur
BEFORE validation starts being written into RFP's and the like.  We
would have no problem with keeping the validation stuff online here at
CMU during the preliminary development phase, but as soon as the lawyers
show up, we quit.

-- Scott

∂12-Nov-84  1947	fateman%ucbdali@Berkeley 	Re:  validation process 
Received: from UCB-VAX.ARPA by SU-AI.ARPA with TCP; 12 Nov 84  19:47:22 PST
Received: from ucbdali.ARPA by UCB-VAX.ARPA (4.24/4.39)
	id AA10218; Mon, 12 Nov 84 19:49:39 pst
Received: by ucbdali.ARPA (4.24/4.39)
	id AA13777; Mon, 12 Nov 84 19:43:29 pst
Date: Mon, 12 Nov 84 19:43:29 pst
From: fateman%ucbdali@Berkeley (Richard Fateman)
Message-Id: <8411130343.AA13777@ucbdali.ARPA>
To: brown@DEC-HUDSON, cl-validation@su-ai
Subject: Re:  validation process

I think that confidentiality of information on this mailing list is
unattainable, regardless of its desirability.

∂13-Nov-84  0434	brown@DEC-HUDSON 	Confidentially loses  
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 13 Nov 84  04:34:11 PST
Date: Tue, 13 Nov 84 07:35:21 EST
From: brown@DEC-HUDSON
Subject: Confidentially loses
To: fahlman@cmu-cs-c
Cc: cl-validation@su-ai

I guess you are right.  I can't expect the results of public domain tests
or the communications on this mailing list to be treated confidentially.
So, I retract the issue.  I'll make that my own comments are not "sensitive".
-Gary

∂18-Dec-85  1338	PACRAIG@USC-ISIB.ARPA 	Assistance please?    
Received: from USC-ISIB.ARPA by SU-AI.ARPA with TCP; 18 Dec 85  13:36:21 PST
Date: 18 Dec 1985 11:17-PST
Sender: PACRAIG@USC-ISIB.ARPA
Subject: Assistance please?
From:  Patti Craig <PACraig@USC-ISIB.ARPA>
To: CL-VALIDATION@SU-AI.ARPA
Message-ID: <[USC-ISIB.ARPA]18-Dec-85 11:17:56.PACRAIG>

Hi,

Need some information relative to the CL-VALIDATION@SU-AI
mailing list.  Would the maintainer of same please contact
me.

Thanks,

Patti Craig
USC-Information Sciences Institute

∂12-Mar-86  2357	cfry%OZ.AI.MIT.EDU@MC.LCS.MIT.EDU 	Validation proposal 
Received: from MC.LCS.MIT.EDU by SU-AI.ARPA with TCP; 12 Mar 86  23:56:26 PST
Received: from MOSCOW-CENTRE.AI.MIT.EDU by OZ.AI.MIT.EDU via Chaosnet; 13 Mar 86 02:55-EST
Date: Thu, 13 Mar 86 02:54 EST
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Validation proposal
To: berman@ISI-VAXA.ARPA, cl-validation@SU-AI.ARPA
Message-ID: <860313025420.4.CFRY@MOSCOW-CENTRE.AI.MIT.EDU>

We need to have a standard format for validation tests.
To do this, I suggest we hash out a design spec
before we get serious about assigning chapters to implementors.
I've constructed a system which integrates diagnostics and
hacker's documentation. I use it and it saves me time.
Based on that, here's my proposal for a design spec.

GOAL [in priority order]
   To verify that a given implementation is or is not correct CL.
   To aid the implementor in finding out the discrepancies between
      his implementation and the agreed upon standard.
   To suppliment CLtL by making the standard more prescise.
   To provide examples for future CLtLs, or at least a format
      for machine-readable examples, which will make it easier to
      verify that the examples are, in fact, correct.
   ..... the below of auxiliary importance
   To facilitate internal documentation [documenatation
      used primarily by implementaors while developing]
   To give CL programmers a suggested format for diagnostics and
      internal documentation. [I argue that every programmer of
      a medium to large program could benifit from such a facility].

RELATION of validation code to CL
   It should be part of yellow pages, not CL.

IMPLEMENTATION: DESIRABLE CHARACTERISTICS
   small amount of code
   uses a small, simple subset of CL so that:
        1. implementors can use it early in the developement cycle
        2. It will depend on little and thus be more reliable.
            [we want to test specific functions in a controlled way,
             not the code that implements the validation software.]
    We could, for example, avoid using: 
          macros, 
          complex lambda-lists,
          sequences, 
          # reader-macros, 
          non-fixnum numbers

FEATURES & USER INTERFACE:
   simple, uniform, lisp syntax

   permit an easy means to test:
     - all of CL
     - all of the functions defined in a file. 
     - all of the tests for a particular function
     - individual calls to functions.

   Allow a mechanism for designating certain calls as
      "examples" which illustrate the functionality of the
      function in question. Each such example should have
        -the call
        -the expected result [potentially an error]
        -an optional explaination string ie 
           "This call errored because the 2nd arg was not a number."

----------
Here's an example of diagnostics for a function:

(test:test 'foo
  '((test:example (= (foo 2 3) 5)  "foo returns the sum of its args.")
     ;the above is a typical call and may be used in a manual along
     ;with the documentation string of the fn
    (not (= (foo 4 5) -2))
     ;a diagnostic not worthy of being made an example of. There will
     ;generally be several to 10's of such calls.
    (test:expected-error (foo 7) "requires 2 arguments")
       ;if the expression is evaled, it should cause an error
    (test:bug (foo 3 'bar) "fails to check that 2nd arg is not a number")
      ;does not perform as it should. Such entries are a convienient place
      ;for a programmer to remind himself that the FN isn't fully debugged yet.
    (test:bug-that-crashes (foo "trash") "I've GOT to check the first arg with numberp!")
  ))

TEST is a function which sequentially processes the elements of the 
list which is its 2nd arg. If an entry is a list whose car is:
   test:example      evaluate the cadr. if result is non-nil
                     do nothing, else print a bug report.
   test:expected-error  evaluate the cadr. If it does not produce an error,
                     then print a bug report.
   test:bug          evaluate the cadr. it should return NIL or error.
                     If it returns NIL or error, print a "known" bug report.
                      otherwise print "bug fixed!" message.
                     [programmer should then edit the entry to not be wrapped in
                     a test:bug statement.]
   test:bug-that-crashes Don't eval the cadr. Just print the
                     "known bug that crashes" bug report.
  There's a bunch of other possibilities in this area, like:
  test:crash-example  don't eval the cadr, but use this in documentation
  
  Any entry without a known car, will just get evaled, if it returns nil or errors,
    print a bug report. The programmer can then fix the bug, or wrap a
   test:bug around the call to acknowledge the bug. This helps separate the
   "I've seen this bug before" cases from the "this is a new bug" cases.

With an editor that permits evaluation of expressions, [emacs and sons]
its easy to eval single calls or the whole test.
When evaluating the whole test, a summary of what went wrong can be
printed at the end of the sequence like "2 bugs found".

I find it convienient to place calls to test right below the definition
of the function that I'm testing. My source code files are about
half tests and half code. I have set up my test function such that
it checks to see if it is being called as a result of being loaded
from a file. If so, it does nothing. Our compiler is set up to
ignore calls to TEST, so they don't get into compiled files.

I have a function called TEST-FILE which reads each form in the file.
If the form is a list whose car is TEST, the form is evaled, else the
form is ignored.

Some programmers prefer to keep tests in a separate file from the
source code that they are writting. This is just fine in my implementation,
except that that a list of the source code files can't be used in
testing a whole system unless there's a simple mapping between
source file name and test file name.

Its easy to see how a function could read through a file and pull
put the examples [amoung other things].

Since the first arg to the TEST fn is mainly used to tell the user what
test is being performed, it could be a string explaining in more
detail the catagory of the below calls, ie "prerequisites-for-sequences" .

Notice that to write the TEST function itself, you need not have:
macros, &optional, &rest, or &key working, features that minimal lisps
often lack.

Obviously this proposal could use creativity of many sorts.
Our actual spec should be to make the file format, though, not
to add fancy features. Such features can vary from implementation to
implementation, which will aid evolution of automatic diagnostics and
documentation software. 
But to permit enough hooks in the file format, we need insight as to the potential
breadth of such a mechanism. Thus, new goals might also be a valuable
addition to this proposal.

FRY

∂13-Mar-86  1015	berman@isi-vaxa.ARPA 	Re: Validation proposal
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 13 Mar 86  10:12:38 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA03979; Thu, 13 Mar 86 10:12:11 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603131812.AA03979@isi-vaxa.ARPA>
Date: 13 Mar 1986 1012-PST (Thursday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: berman@ISI-VAXA.ARPA, cl-validation@SU-AI.ARPA
Subject: Re: Validation proposal
In-Reply-To: Your message of Thu, 13 Mar 86 02:54 EST.
             <860313025420.4.CFRY@MOSCOW-CENTRE.AI.MIT.EDU>


Christopher,

Thanks for the suggestion.  Unfortunately there are already many thousands of
lines of validation code written amongst a variety of sources.  ISI is
supposed to first gather these and then figure out which areas are covered,
and in what depth.  

A single validation suite will eventually be constructed with the existing
tests as a starting point.  Therefore, we will probably not seriously consider
a standard until we have examined this extant code.  I'll keep CL-VALIDATION
informed of the sort of things we discover, and at some point I will ask for
proposals, if indeed I don't put one together myself.

Once we know what areas are already covered, we will assign the remaining
areas to the various willing victims (er, volunteers) to complete, and it is
this part of the suite which will be created with a standard in place.

Etc.,

RB

∂13-Mar-86  1028	berman@isi-vaxa.ARPA 	Re: Validation proposal
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 13 Mar 86  10:28:21 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA04181; Thu, 13 Mar 86 10:27:56 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603131827.AA04181@isi-vaxa.ARPA>
Date: 13 Mar 1986 1027-PST (Thursday)
To: Christopher Fry <cfry@MIT-OZ%MIT-MC.ARPA>
Cc: berman@ISI-VAXA.ARPA, cl-validation@SU-AI.ARPA
Subject: Re: Validation proposal
In-Reply-To: Your message of Thu, 13 Mar 86 02:54 EST.
             <860313025420.4.CFRY@MOSCOW-CENTRE.AI.MIT.EDU>


Christopher,

Thanks for the suggestion.  Unfortunately there are already many thousands of
lines of validation code written amongst a variety of sources.  ISI is
supposed to first gather these and then figure out which areas are covered,
and in what depth.  

A single validation suite will eventually be constructed with the existing
tests as a starting point.  Therefore, we will probably not seriously consider
a standard until we have examined this extant code.  I'll keep CL-VALIDATION
informed of the sort of things we discover, and at some point I will ask for
proposals, if indeed I don't put one together myself.

Once we know what areas are already covered, we will assign the remaining
areas to the various willing victims (er, volunteers) to complete, and it is
this part of the suite which will be created with a standard in place.

Etc.,

RB


P.S.
I had to change your address (see header) 'cuz for some reason our mail
handler threw up on the one given with your message.


∂17-Mar-86  0946	berman@isi-vaxa.ARPA 	Re: Validation proposal
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 17 Mar 86  09:46:27 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA11654; Mon, 17 Mar 86 09:46:19 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603171746.AA11654@isi-vaxa.ARPA>
Date: 17 Mar 1986 0946-PST (Monday)
To: cfry%oz@MIT-MC.ARPA
Cc: cl-Validation@su-ai.arpa
Subject: Re: Validation proposal
In-Reply-To: Your message of Mon, 17 Mar 86 04:30 EST.
             <860317043024.5.CFRY@DUANE.AI.MIT.EDU>


Thanks, and I look forward to seeing your tests.  And yes, I'm sure that
interested parties will get to review the test system before its in place.

RB



------- End of Forwarded Message

∂19-Mar-86  1320	berman@isi-vaxa.ARPA 	Re: Validation Contributors 
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 19 Mar 86  13:20:08 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA08917; Wed, 19 Mar 86 13:19:50 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603192119.AA08917@isi-vaxa.ARPA>
Date: 19 Mar 1986 1319-PST (Wednesday)
To: Reidy.pasa@Xerox.COM
Cc: Reidy.pasa@Xerox.COM, berman@isi-vaxa.ARPA, CL-Validation@su-ai.ARPA
Subject: Re: Validation Contributors
In-Reply-To: Your message of 19 Mar 86 11:29 PST.
             <860319-112930-3073@Xerox>


As a matter of fact, in the end it WILL be organized parrallel to the book.
For now I'm just gathering the (often extensive) validation suites that have
been produced at various sites.  These will need to be evaluated before
assigning tasks to people who want to write some code for this.  By that time
we will also have a standard format for these tests so that this new code will
fit in with the test manager.

Send messages to CL-VALIDATION@SU-AI.ARPA rather than CL general list when
discussing this, unless it is of broader interest ofcourse.

Thanks.

RB

∂27-Mar-86  1332	berman@isi-vaxa.ARPA 	Validation Distribution Policy   
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 27 Mar 86  13:32:16 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA22595; Thu, 27 Mar 86 13:32:06 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603272132.AA22595@isi-vaxa.ARPA>
Date: 27 Mar 1986 1332-PST (Thursday)
To: CL-Validation@su-ai.arpa
Subject: Validation Distribution Policy



------- Forwarded Message

Return-Path: <OLDMAN@USC-ISI.ARPA>
Received: from USC-ISI.ARPA by isi-vaxa.ARPA (4.12/4.7)
	id AA13746; Wed, 26 Mar 86 13:35:26 pst
Date: 26 Mar 1986 16:24-EST
Sender: OLDMAN@USC-ISI.ARPA
Subject: Validation in CL
From: OLDMAN@USC-ISI.ARPA
To: berman@ISI-VAXA.ARPA
Message-Id: <[USC-ISI.ARPA]26-Mar-86 16:24:40.OLDMAN>

Yes, we have tests and a manager.  I have started the wheels
moving on getting an OK from management for us to donate them.
Is there a policy statement on how they will be used or
distributed available? ...

-- Dan Oldman

------- End of Forwarded Message

I don't recall any exact final statement of the type of access.  I remember
there was some debate on whether it should be paid for by non-contributors,
but was there any conclusion?

RB

∂29-Mar-86  0819	FAHLMAN@C.CS.CMU.EDU 	Validation Distribution Policy   
Received: from C.CS.CMU.EDU by SU-AI.ARPA with TCP; 29 Mar 86  08:19:13 PST
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Sat 29 Mar 86 11:19:51-EST
Date: Sat, 29 Mar 1986  11:19 EST
Message-ID: <FAHLMAN.12194592953.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λisi-vaxa.ARPA (Richard Berman)λ
Cc:   CL-Validation@SU-AI.ARPA
Subject: Validation Distribution Policy
In-reply-to: Msg of 27 Mar 1986  16:32-EST from berman at isi-vaxa.ARPA (Richard Berman)


    I don't recall any exact final statement of the type of access.  I remember
    there was some debate on whether it should be paid for by non-contributors,
    but was there any conclusion?

I believe that the idea that free access to the validation code be used
as an incentive to get companies to contribute was discussed at the
Boston meeting, but finally abandoned as being cumbersome, punitive, and
not necessary.  Most of the companies there agreed to contribute
whatever vailidation code they had, and/or some labor to fill any holes
in the validation suite, with the understanding that the code would be
pulled into a reasonably coherent form at ISI and then would be made
freely available to all members of the community.  This release would
not occur until a number of companies had contributed something
significant, and then the entire collection up to that point would be
made available at once.

I believe that Dick Gabriel was the first to say that his company would
participate under such a plan, and that he had a bunch of conditions
that had to be met.  If there are any not captured by the above
statement, maybe he can remind us of them.

-- Scott

∂16-Jun-86  1511	berman@isi-vaxa.ARPA 	Validation Suite  
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 16 Jun 86  15:11:47 PDT
Received: by isi-vaxa.ARPA (4.12/4.7)
	id AA19003; Mon, 16 Jun 86 15:11:38 pdt
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8606162211.AA19003@isi-vaxa.ARPA>
Date: 16 Jun 1986 1511-PDT (Monday)
To: CL-VALIDATION@su-ai.arpa
Cc: berman@isi-vaxa.ARPA
Subject: Validation Suite


Well, now that some of the contributions to the Great Validation Suite have
begun to filter in, I have been asked to make a report for broad issue on 1
July summarizing the status of all the validation contributions.

I hope this is enough time so that everything can be whipped into shape.
Please do contact me regarding the status of your validation and how its
progressing.  If I haven't yet contacted you, please send me a mesage.  You
may not be on my list.  (Also, I cannot seem to reach a few of you via network
for whatever reason).

So...

I DO need you validation contributions.

We ARE putting together a master validation suite, once more of the
contributions arrive.

Thanks.

Richard Berman
USC/ISI
(213) 822-1511

∂09-Jul-86  1213	berman@vaxa.isi.edu 	Validation Control 
Received: from VAXA.ISI.EDU by SU-AI.ARPA with TCP; 9 Jul 86  12:09:58 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA27003; Wed, 9 Jul 86 12:09:47 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607091909.AA27003@vaxa.isi.edu>
Date:  9 Jul 1986 1209-PDT (Wednesday)
To: CL-Validation@SU-AI.ARPA
Cc: 
Subject: Validation Control


Well, I've got quite a goodly collection of tests from which to construct a
first pass suite.  Here's the situation:  Each set of tests (from the various
vendors) uses it's own control mechanism, usually in the form of some macro
surrounding a (set of) test(s).  Some require an error handler.

By and large all tests take a similar fo.  Each is composed of a few parts:

1.  A form to evaluate.
2.  The desired result.
3.  Some kind of text for error reporting.

Some versions give each test a unique name.

Some versions specify a test "type", e.g. evaltest means to evaluate the form,
errortest means the test should generate an error (and so the macro could
choose not to do anything with the test if no error handling is present).

What I am looking for is a simple and short proposal for how to
arrange/organize tests in the suite.  Currently I am organizing according to
sections in CLtL.  This isn't entirely sufficient, especially for some of the
changes that have been accepted since its publication.  

So what kind of control/reporting/organizing method seems good to you?

As I am already organizing this, please do not delay.  If enough inertia
builds up then whatever I happen to decide will end up as the first pass.  So
get your tickets NOW!

RB

∂22-Jul-86  1344	berman@vaxa.isi.edu 	test control  
Received: from VAXA.ISI.EDU by SU-AI.ARPA with TCP; 22 Jul 86  13:44:07 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA16122; Tue, 22 Jul 86 13:44:01 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607222044.AA16122@vaxa.isi.edu>
Date: 22 Jul 1986 1343-PDT (Tuesday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: test control


I am preparing the first cut at the test suite.  Each test is wrapped in a
macro, which I propose below:
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←

The macro for test control should allow for the following:

1.  Contributor string.  Who wrote/contributed it.

2.  Test I.D.  In most cases this would be just the name of the function.  In
other cases it may be an identifier as to what feature is being tested, such
as SCOPING.
 
3.  Test type.  E.g. Eval, Error, Ignore, etc.

4.  N tests (or pairs of tests and expected results).

5.  Side effects testing.  With each test from #4 above it should be possible
to give n forms which must all evaluate to non-NIL.

6.  Test name. Unique for each test.

7.  Form to evaluate if test fails.  This may be useful later to help analyze
beyond the first order.

8.  Error string.


In number 2 above, the identifier must be selected from amongst those provided
in a database.  This database relates identifiers to section numbers (or to
some other ordering scheme) and is used by some form of test management to
schedule the sequence of testing.  This allows for automatic ordering.  For
example, all the function names are in the database, as well as such "topics"
as scoping, error detection, etc.

For now the ordering database will probably be aligned with the silver book,
but later on I expect it will be organized parallel with the language spec.

←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←

Note:  I've already got the data base in some form.  What I want to know from
you as a test contributor (or potential contributor) is:  Does the above macro
provide enough information for adequate control and analysis, in you opinion?

Suggestions should be sent soon, because I'm gonna be implementing it in the
next 10 days.

Best,

RB

∂23-Jul-86  2104	NGALL@G.BBN.COM 	Re: test control  
Received: from BBNG.ARPA by SAIL.STANFORD.EDU with TCP; 23 Jul 86  21:03:48 PDT
Date: 24 Jul 1986 00:00-EDT
Sender: NGALL@G.BBN.COM
Subject: Re: test control
From: NGALL@G.BBN.COM
To: berman@ISI-VAXA.ARPA
Cc: cl-validation@SU-AI.ARPA
Message-ID: <[G.BBN.COM]24-Jul-86 00:00:45.NGALL>
In-Reply-To: <8607222044.AA16122@vaxa.isi.edu>

	
    Date: 22 Jul 1986 1343-PDT (Tuesday)
    From: berman@vaxa.isi.edu (Richard Berman)
    To: cl-validation@su-ai.arpa
    Subject: test control
    Message-ID: <8607222044.AA16122@vaxa.isi.edu>
    
    
    I am preparing the first cut at the test suite.  Each test is wrapped in a
    macro, which I propose below:
    ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
    
    The macro for test control should allow for the following:
    
    1.  Contributor string.  Who wrote/contributed it.
    
    2.  Test I.D.  In most cases this would be just the name of the function.  In
    other cases it may be an identifier as to what feature is being tested, such
    as SCOPING.
     
    3.  Test type.  E.g. Eval, Error, Ignore, etc.
    
    4.  N tests (or pairs of tests and expected results).
    
    5.  Side effects testing.  With each test from #4 above it should be possible
    to give n forms which must all evaluate to non-NIL.
    
    6.  Test name. Unique for each test.
    
    7.  Form to evaluate if test fails.  This may be useful later to help analyze
    beyond the first order.
    
    8.  Error string.
    
    
    In number 2 above, the identifier must be selected from amongst those provided
    in a database.  This database relates identifiers to section numbers (or to
    some other ordering scheme) and is used by some form of test management to
    schedule the sequence of testing.  This allows for automatic ordering.  For
    example, all the function names are in the database, as well as such "topics"
    as scoping, error detection, etc.
    
    For now the ordering database will probably be aligned with the silver book,
    but later on I expect it will be organized parallel with the language spec.
    
    ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
    
    Note:  I've already got the data base in some form.  What I want to know from
    you as a test contributor (or potential contributor) is:  Does the above macro
    provide enough information for adequate control and analysis, in you opinion?
    
    Suggestions should be sent soon, because I'm gonna be implementing it in the
    next 10 days.
    
    Best,
    
    RB
    
	      --------------------
		
How about a field that indicates which revision of CL this test
applies to?

-- Nick

∂24-Jul-86  0254	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	test control    
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  02:53:02 PDT
Received: from DUANE.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 40517; Thu 24-Jul-86 05:55:50-EDT
Date: Thu, 24 Jul 86 05:54 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: test control
To: berman@vaxa.isi.edu, cl-validation@SU-AI.ARPA
In-Reply-To: <8607222044.AA16122@vaxa.isi.edu>
Message-ID: <860724055418.1.CFRY@DUANE.AI.MIT.EDU>


    I am preparing the first cut at the test suite.  Each test is wrapped in a
    macro, which I propose below:
    ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←

    The macro for test control should allow for the following:

    1.  Contributor string.  Who wrote/contributed it.
Nice to keep around. But won't you generally have a whole bunch of tests
in a file from 1 contributor? You shouldn't have to have their name
on every test.

    2.  Test I.D.  In most cases this would be just the name of the function.  In
    other cases it may be an identifier as to what feature is being tested, such
    as SCOPING.
 
    3.  Test type.  E.g. Eval, Error, Ignore, etc.
Please be more specific on what this means.

    4.  N tests (or pairs of tests and expected results).
Typically how large is N? 1, 10, 100, 1000?

    5.  Side effects testing.  With each test from #4 above it should be possible
    to give n forms which must all evaluate to non-NIL.
Particularly for a large N, side effect testing should be textually adjcent to
whatever its affecting.

    6.  Test name. Unique for each test.
This should be adjacent to test-id

    7.  Form to evaluate if test fails.  This may be useful later to help analyze
    beyond the first order.
typically NIL ? By "TEST" do you mean if one of the above N fails ,eval this form?
Should it be evaled for each of the N that fail?

    8.  Error string.
Similar to above?

    In number 2 above, the identifier must be selected from amongst those provided
    in a database.  This database relates identifiers to section numbers (or to
    some other ordering scheme) and is used by some form of test management to
    schedule the sequence of testing.  This allows for automatic ordering.  For
    example, all the function names are in the database, as well as such "topics"
    as scoping, error detection, etc.

    For now the ordering database will probably be aligned with the silver book,
    but later on I expect it will be organized parallel with the language spec.

    ←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←

    Note:  I've already got the data base in some form.  What I want to know from
    you as a test contributor (or potential contributor) is:  Does the above macro
    provide enough information for adequate control and analysis, in you opinion?

    Suggestions should be sent soon, because I'm gonna be implementing it in the
    next 10 days.

Above is not only ambiguous, but too abstract to get a feel for it.
Send us several examples, both typical and those at the extreme ranges of
size and complexity. I want to see the actual syntax.

Guessing at what you mean here, it looks like its going to take someone a very
long time to make the tests in such a complex format.
And you lose potential flexibility.
My format distributes control much more locally to each form to be evaled.
And it allows for simple incremental add-ons for things you missed in the spec
the first time around. For example, the "EXPECT-ERROR" fn below is such an add-on.
It is not an integral part of the diagnostic-controller, which itself is
quite simple.

To re-iterate my plan:
There's a wrapper for a list of forms to evaluate, typically 5 to 20 forms.
Each form is evaled and if it returns NON-NIL, it passes.
Example:
(test '+
  (= (+ 2 3) 5)
  (expect-error (+ "2" "3")) ;returns T if the call to + errors
  (setq foo (+ 1 2))
  (= foo 3) ;tests side effect. The forms are expected to be evaled sequentially.
   ;anything that depends on a particular part of the environment to be "clean"
   ;before it tests something should have forms that clean it up first,
   ; like before the above call to setq you might say (makunbound 'foo)
  (progn (bar) t) ; one way of testing a form where it is expected not to error
    ;but don't care if it returns NIL or NON-NIL. If you found you were using this
    ;idiom a lot, you could write DONT-CARE trivially, as an add-on.
)

If you really wanted to delcare that a particular call tested a side-effect, or that
a particular call produced a side-effect, you could write a small wrapper fn for it,
but I'd guess that wouldn't be worth the typing. Such things should be obvious from
context.

Programmers are very reluctant to write diagnostics, so lets try to
make it as painless as possible. Maybe there could be some 
macros that would fill in certain defaults of your full-blown format.

One of the things that's so convienient about my mechanism is that
a hacker can chose to, with a normal lisp text editor, eval part of
a call, a whole call, a group of calls [by selecting the region],
a whole TEST, or via my fn "test-file" a whole file.
[I also have "test-module" functionality for a group of files.]
Having this functionality makes the diagnostics more than just
a "validation" suite. It makes it a real programming tool.
And thus it will get used more often, and the tests themselves will
get performed more often.
This will lead to MORE tests as well as MORE TESTED tests, which
also implies that hackersimplementors will have more tested implementations,
which, after all, furthers the ultimate goal of having accurate
implementations out there.

.....
Before settling on a standard format, I'd also recommend just
converting a large file of tests into the proposed format
[before implementing the code that performs the test].

This will help you feel redundancies in the format
by noticing your worn out fingers.
But it will also help you see what parts of the syntax are
hard to remember and in need of more keywords or better named
functions, or less nested parens.

If the proposed format passes this test, it can be used as the
TEST code for the TEST software itself, as well as testing CL.
If not, you didn't waste time implementing a bad spec.

Despite the volume of my comments, I'm glad you're getting
down to substantial issues on what features to include.

CFry 

∂24-Jul-86  1053	berman@vaxa.isi.edu 	Re: test control   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  10:50:55 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA04581; Thu, 24 Jul 86 10:49:05 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607241749.AA04581@vaxa.isi.edu>
Date: 24 Jul 1986 1049-PDT (Thursday)
To: NGALL@G.BBN.COM
Cc: cl-validation@SU-AI.ARPA, berman@ISI-VAXA.ARPA
Subject: Re: test control
In-Reply-To: Your message of 24 Jul 1986 00:00-EDT.
             <[G.BBN.COM]24-Jul-86 00:00:45.NGALL>


'Cuz the whole suite will be for a particular revision.  There will
be no tests in the suite that do not apply to the particular level/revision.

RB

∂24-Jul-86  1148	marick%turkey@gswd-vms.ARPA 	Re: test control
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 24 Jul 86  11:22:08 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
	id AA11926; Thu, 24 Jul 86 13:20:56 CDT
Message-Id: <8607241820.AA11926@gswd-vms.ARPA>
Date: Thu, 24 Jul 86 13:20:47 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: berman@vaxa.isi.edu, cl-validation@su-ai.arpa
Subject: Re: test control


I have trouble visualizing what a test looks like.  Could you provide 
examples?  

Some general comments:

1.  I hope that often-unnecessary parts of a test (like contributor
string, error string, form-to-evaluate-if-test-fails) are optional.

2.  It would be nice if the test driver were useful for small-scale
regression testing.  (That is, "I've changed TREE-EQUAL.  O driver,
please run all the tests for TREE-EQUAL.")  It seems you have this in
mind, but I just wanted to reinforce any tendencies.

3.  The format of the database should be published, since people will
want to write programs that use it.

4.  It's very useful to have an easy way of specifying the predicate to
use when comparing the actual result to the expected result.
The test suite ought to come with a library of such predicates.

5.  I'd like to see a complete list of test types.  What a test type is
is a bit fuzzy, but we have at least the following:

  ordinary -- form evaluated and compared to unevaluated expected result.
	      (This is a convenience; you get tired of typing ')
  eval -- form evaluated and compared to evaluated expected result.
  fail -- doesn't run the test, just notes that there's an error.  This
          is used when an error breaks the test harness; it shouldn't 
	  appear in the distributed suite, of course, but it will be
	  useful for people using the test suite in day-to-day regression
	  testing.
  error -- the form is expected to signal an error; it fails if it does
          not.
  is-error -- if the form signals an error it passes.  If it doesn't signal
	  an error, it passes only if it matches the "expected" result.
	  We use this to make sure that some action which is defined to
	  be "is an error" produces either an error or some sensible result.
 	  It may not be appropriate for the official suite.  (Note that there
	  really should be an evaluating and a non-evaluating version.)


6.  Then you need to cross all those test types with a raft of issues
surrounding the compiler.  Like:

a. For completeness, you should run the tests interpreted, compiled with
#'COMPILE, and compiled with #'COMPILE-FILE. (What COMPILE-FILE does
might not be a strict superset of what COMPILE does.)

b. Suppose you're testing a signalled error.  What happens if the error
is detected at compile time?  (This is something like the IS-ERROR case
above: either the compile must fail or running the compiled version
should do the same thing the interpreted version does.)

c. It may be the case that compiled code does less error checking than
interpreted code.  OPTIMIZE switches can have the same effect.  So you may 
want to write tests that expect errors in interpreted code, but not in 
compiled code.  (This, again, is probably not relevant to the official test 
suite, but, again, the easier it is to tune the test suite, the happier 
implementors will be.)

6.  What does the output look like?  This test suite is going to be
huge, so it's especially important that you be able to easily find
differences between successive runs.

∂24-Jul-86  1546	berman@vaxa.isi.edu 	    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  12:38:39 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA06466; Thu, 24 Jul 86 12:35:53 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607241935.AA06466@vaxa.isi.edu>
Date: 24 Jul 1986 1235-PDT (Thursday)
To: marick%turkey@gswd-vms.ARPA
Cc: cl-validation@su-ai.arpa
Subject: 


Let me clarify.  First, I don't think this macro is used to control testing so
much as it is to help maintain the actual testing suite itself.  The testing
suite is supposed to eventually incarnate under ISI's FSD data base fascility,
as described in the proposal that I offered to one and all a short while back.

What this macro should do is allow me to build a test suite from amongst all
the tests.  With that in mind:


	1.  I hope that often-unnecessary parts of a test (like contributor
	string, error string, form-to-evaluate-if-test-fails) are optional.

Probably, except for contributor.  The others can be NIL or created from other
data.

	2.  It would be nice if the test driver were useful for small-scale
	regression testing.  (That is, "I've changed TREE-EQUAL.  O driver,
	please run all the tests for TREE-EQUAL.")  It seems you have this in
	mind, but I just wanted to reinforce any tendencies.

Sure.

	3.  The format of the database should be published, since people will
	want to write programs that use it.

Unlikely.  See above re: FSD.  It can't "be published" as it is just part of a
live environment.

	4.  It's very useful to have an easy way of specifying the predicate to
	use when comparing the actual result to the expected result.
	The test suite ought to come with a library of such predicates.

Well -- you could be a little more clear on this.  Like what?  Also, it is the
contributors who will write these tests.  I imagine that most of the time an
EQ or EQUAL type would be used, and other less typical or special purpose
predicates will probably not be useful to other contributors.

	5.  I'd like to see a complete list of test types.  What a test type is
	is a bit fuzzy, but we have at least the following:

	  ordinary -- form evaluated and compared to unevaluated expected result.
	      (This is a convenience; you get tired of typing ')
		  eval -- form evaluated and compared to evaluated expected result.
	  fail -- doesn't run the test, just notes that there's an error.  This
	          is used when an error breaks the test harness; it shouldn't 
		  appear in the distributed suite, of course, but it will be
		  useful for people using the test suite in day-to-day regression
		  testing.
	  error -- the form is expected to signal an error; it fails if it does
	          not.
	  is-error -- if the form signals an error it passes.  If it doesn't signal
		  an error, it passes only if it matches the "expected" result.
		  We use this to make sure that some action which is defined to
		  be "is an error" produces either an error or some sensible result.
 		  It may not be appropriate for the official suite.  (Note that there
		  really should be an evaluating and a non-evaluating version.)

Sounds to me like you got the idea.  These are classifications of tests used
to control the testing process.  In addition, this being a part of the
database, one could create a test suite for just certain classes of tests.

And as for compiler stuff--for now it will probably just allow you to test
each test interpreted, compiled or both (possibly not in the very first cut).
Other issues will be taken up as the suite develops.

	6.  What does the output look like?  This test suite is going to be
	huge, so it's especially important that you be able to easily find
	differences between successive runs.

Each failing test will give some kind of report, identifying the test.  As the
suite develops, more sophisticated reporting will be developed that fills the
needs of developers.  How's that for using the word "develop" too much?


RB

∂24-Jul-86  1549	berman@vaxa.isi.edu 	test control  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  13:05:59 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA06785; Thu, 24 Jul 86 13:03:54 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607242003.AA06785@vaxa.isi.edu>
Date: 24 Jul 1986 1303-PDT (Thursday)
To: cfy@OZ.AI.MIT.EDU
Cc: cl-validation@su-ai.arpa
Subject: test control


Please see my message to Marick, which answers some of your questions.  As for
the others:


	    1.  Contributor string.  Who wrote/contributed it.
	Nice to keep around. But won't you generally have a whole bunch of tests
	in a file from 1 contributor? You shouldn't have to have their name
	on every test.

Nope.  The tests will be separated into the various sections of the book under
which the test best fits.  These will then be assembled into a test for that
section.  Note also Marick's comments re regression analysis.

 
	    3.  Test type.  E.g. Eval, Error, Ignore, etc.
	Please be more specific on what this means.

See Marick's comments.

	    4.  N tests (or pairs of tests and expected results).
	Typically how large is N? 1, 10, 100, 1000?

I imagine N is very small.  It should be what you could call a "testing unit"
which does enough to conclusively report success/failure of some specific
thing being tested.


	    5.  Side effects testing.  With each test from #4 above it should be possible
	    to give n forms which must all evaluate to non-NIL.
	Particularly for a large N, side effect testing should be textually adjcent to
	whatever its affecting.

Certainly would enhance readability/maintanability, etc.


	    6.  Test name. Unique for each test.
	This should be adjacent to test-id

Sure.


	    7.  Form to evaluate if test fails.  This may be useful later to help analyze
	    beyond the first order.
	typically NIL ? By "TEST" do you mean if one of the above N fails ,eval this form?
	Should it be evaled for each of the N that fail?

Well, each thing wrapped by this macro should be a "testing unit" as above, so
if any of N fails the remaining tests in that macro probably won't be
executed, and this form will then be evaluated.

	    8.  Error string.
	Similar to above?

Not at all.  This is what to say in the event of an error.  It is optional
because a reporting mechanism can construct a message, but for more
readability or for other reasons (as deemed useful by the test implementor) a
canned string can be printed as well.


	Above is not only ambiguous, but too abstract to get a feel for it.
	Send us several examples, both typical and those at the extreme ranges of
	size and complexity. I want to see the actual syntax.

Well, I hope this and other messages help that problem.  As for syntax - until
it is implemented, there isn't any.  If you still don't see why this data is
needed, or if it isn't clear about the "database" stuff I mentioned, please
call me.


	Guessing at what you mean here, it looks like its going to take someone a very
	long time to make the tests in such a complex format.
	And you lose potential flexibility.

I couldn't disagree more.  I have received a great deal of testing material
and this is not much more "complex" than most.  It actually allows (in
conjunction with the testing database) a far more flexible testing regimen
than any I've seen.

(As for your methodology -- it has much merit.  Perhaps my use of parts of it
are too disguised here?)

	Programmers are very reluctant to write diagnostics, so lets try to
	make it as painless as possible. Maybe there could be some 
	macros that would fill in certain defaults of your full-blown format.

Only new contributions need to be in this format.  I would expect a wise
programmer to come up with a number of ways to automate this.  I for one would
not type my company name (contributor ID) for each one.


	One of the things that's so convienient about my mechanism is that
	a hacker can chose to, with a normal lisp text editor, eval part of
	a call, a whole call, a group of calls [by selecting the region],
	a whole TEST, or via my fn "test-file" a whole file.
	[I also have "test-module" functionality for a group of files.]
	Having this functionality makes the diagnostics more than just
	a "validation" suite. It makes it a real programming tool.
	And thus it will get used more often, and the tests themselves will
	get performed more often.
	This will lead to MORE tests as well as MORE TESTED tests, which
	also implies that hackersimplementors will have more tested implementations,
	which, after all, furthers the ultimate goal of having accurate
	implementations out there.

Certainly one goal is to make the tests useful.  We hope to have an online
(via network) capability for testers to request their own test suites, as
customized as we can.  For others, a testing file can be generated.  Have you
read the ISI proposal for CL support?

	.....
	Before settling on a standard format, I'd also recommend just
	converting a large file of tests into the proposed format
	[before implementing the code that performs the test].

Am doing that now, with the CDC test suite.

	This will help you feel redundancies in the format
	by noticing your worn out fingers.
	But it will also help you see what parts of the syntax are
	hard to remember and in need of more keywords or better named
	functions, or less nested parens.

You bet.


	If the proposed format passes this test, it can be used as the
	TEST code for the TEST software itself, as well as testing CL.
	If not, you didn't waste time implementing a bad spec.

As with any large (any many smaller) systems, the test suite will go through
the various stages of incrmental development.  I'm sure we'll discard a
paradigm or two on the way.

	Despite the volume of my comments, I'm glad you're getting
	down to substantial issues on what features to include.

	CFry 

Thank you.

I hope this is helpful.

RB


∂24-Jul-86  1740	FAHLMAN@C.CS.CMU.EDU 	FSD
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86  17:22:22 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Thu 24 Jul 86 20:22:36-EDT
Date: Thu, 24 Jul 1986  20:22 EDT
Message-ID: <FAHLMAN.12225351684.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λvaxa.isi.edu (Richard Berman)λ
Cc:   cl-validation@SU-AI.ARPA
Subject: FSD
In-reply-to: Msg of 24 Jul 1986  15:35-EDT from berman at vaxa.isi.edu (Richard Berman)


Maybe I should have read the earlier proposal more carefully.  This
"incarnate in FSD" business sounds scary.

I had the impression that FSD was an internal tool that you would be
using to maintain the vlaidation suite, but that the validation suite
itself would be one or more Common Lisp files that you can pass out to
people who want to test their systems.  Is that not true?  (This is
separate from the issue of whether validation is done at ISI or
elsewhere; the point is that it should be possible to release the test
suite if that's what we want to do.)  I would hope that the testing code
can be passed around without having to pass FSD around with it (unless
FSD is totally portable and public-domain).

-- Scott

∂25-Jul-86  0047	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	test control    
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  00:47:11 PDT
Received: from DUANE.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 40592; Fri 25-Jul-86 03:50:09-EDT
Date: Fri, 25 Jul 86 03:47 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: test control
To: berman@vaxa.isi.edu, cfry@OZ.AI.MIT.EDU
cc: cl-validation@SU-AI.ARPA
In-Reply-To: <8607242003.AA06785@vaxa.isi.edu>
Message-ID: <860725034736.3.CFRY@DUANE.AI.MIT.EDU>

I apologize for not including the text of the messages I'm replying
to here. Since its more than one, I have a hard time integrating them.
.......

Sounds like you're basically doing the right stuff, but I still
don't see why you don't present us with an example.

You mentioned that you wouldn't have one until the implementation
was complete, then you said you were converting the CDC tests
already. ???

I surmise that ISI will be using some fancy database format that you'll have to
have some hairy hard and software to even get ASCII out of it.
But the interface to that will, I hope, be files containing
lisp expressions, that can be read with the reader and maybe even
tested by evaling them as is or with some modification.
Its this format that I'd like to see an example of.

There was a question about a published spec that you dodged.
I presume there will be a fixed format, and we'll all want to use it.

Since everybody is going to want to use certain "macros" for helping them
manipulate the stuff, can't we just standardize on those too?
To refer to the original issue,
when an implementor sends you a file, it should say just once
at the top of the file who wrote the tests, and what version of CL
they apply to. Actually a list of versions or range of versions may be more
apropriate.

Since it will be a smaller and less controversial amount of code, we can
just standardize on your implementation rather than haggle over
English descriptions, though I hope your implementation will at least
include doc strings. Will this code be Public Domain, or at least
given out to test contributors?

In a bunch of cases you refer to giving a test form and including
an expected value. The issue arises, how do you compare the two?

My mechanism just uses the full power of CL to do comparisons
in the most natural way. There are not 2 parts to a call,
there's just one. And the kind of comparison is integral with
the call ex: (eq foo foo) 
             (not (eq foo bar))
             (= 1 1.0)
             (equalp "foo" "FOO")
There are lots of comparisons, so don't try to special case each one.
When an error system is settled upon, I hope there will be an errorp fn.

Of course, this ends up testing "EQ" at the same time it tests "FOO",
but I think thats, in general unavoidable.
Anyway if EQ is broken, the implementation doesn't have much of a chance.

You said that each form of a group would be tested and when the first
one fails, you stop the test and declare that "REDUCE" or whatever is
broken. I think we can provide higher resolution than that without
much cost, ie (reduce x y z) is broken.
Such resolution will be very valuable to the bug fixer, and even
for someone evaluating the language. Since you dodged my 
question of "How big is N" by saying "very small" instead of
1 -> 5 or whatever, I can't tell what resolution your mechanism
is really going to provide.

∂25-Jul-86  1036	berman@vaxa.isi.edu 	Re: FSD  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  10:36:30 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA16963; Fri, 25 Jul 86 10:35:43 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607251735.AA16963@vaxa.isi.edu>
Date: 25 Jul 1986 1035-PDT (Friday)
To: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: FSD
In-Reply-To: Your message of Thu, 24 Jul 1986  20:22 EDT.
             <FAHLMAN.12225351684.BABYL@C.CS.CMU.EDU>


FSD will be used to maintain a number of things relating to our support of CL.
It need not be distributed itself.  The intended use is to help order and keep
track of the various tests.  For example, there may be tests which are
questionable.  They would be in the database, but not readilly accessable for
the purposes of making a test file until they were verified.

Yes, of course it is files that will be distributed.  FSD can be used to help
create the testing files.  I did note on the proposal (which I did not author)
that ISI intends to send a "team" to do the validation at the manufacturer's
site.  Exactly why (except for official reporting) I don't know.

The test suite, as "incarnated" in FSD, will exist as a bunch of objects, each
of which represents a test and some data about the test.  There are not really
files, as such, in FSD.  

If this still sounds scary, let me know.  One of the purposes of all this is
to eventually allow network access to this database (and for other purposes).


RB

∂25-Jul-86  1051	berman@vaxa.isi.edu 	Re: test control   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  10:50:46 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA17112; Fri, 25 Jul 86 10:49:50 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607251749.AA17112@vaxa.isi.edu>
Date: 25 Jul 1986 1049-PDT (Friday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: test control
In-Reply-To: Your message of Fri, 25 Jul 86 03:47 EDT.
             <860725034736.3.CFRY@DUANE.AI.MIT.EDU>


I sort of thought the notion of a "test unit" would communicate the "N" you
refer to.  Let me be more specific.  N is 1.  But there may be more than one
form.  N here refers to the number of tests of the function/topic being
tested.  Other forms can set things up, etc.  If any form fails, it is THAT
TEST that is reported to have failed, not the entirety of the function/topic.

As for the conversion -- I am mostly working with my organizing database (the
one that will be used to help order the tests) with the CDC stuff as a test
case.

I would sure like to hear more ideas, and from others too.  I think now that I
would modify this testing macro a bit.  I think the "test" proper is in 3
parts.  A setup, the actual test form, and an un-setup.  Obviously only the
test form is required.

I do somewhat like the idea of just using a lisp-form, and if it is supposed
to return some result, just ensure it returns non-nil for "OK".  That is,
using your simpler (pred x y) where pred tests the result, x is the test form,
and y is the desired result.  I still would like to formalize it somewhat into
something that more clearly shows which is the test form and the required
result, as well as the predicate.  See some of the test classes that Marick
describes.  Not all of them care for a result, and I would like that to be
more explicit from the layout of the test text.

I am sorry you feel I am being evasive.  I could just make arbitrary
decisions, but in fact I am relaying all the information, ideas and activities
as they actually are.

RB

∂25-Jul-86  1111	FAHLMAN@C.CS.CMU.EDU 	FSD
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  11:10:53 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Fri 25 Jul 86 14:10:54-EDT
Date: Fri, 25 Jul 1986  14:10 EDT
Message-ID: <FAHLMAN.12225546120.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λvaxa.isi.edu (Richard Berman)λ
Cc:   cl-validation@SU-AI.ARPA
Subject: FSD
In-reply-to: Msg of 25 Jul 1986  13:35-EDT from berman at vaxa.isi.edu (Richard Berman)


That all sounds fine, as long as you people at ISI are able to cause FSD
to create a file that represents a portable test suite with the
parameters you specify (version of Common Lisp, what areas tested, etc.)
If people can come in over the net and produce such portable files for
their own use, so much the better.

-- Scott

∂25-Jul-86  1127	berman@vaxa.isi.edu 	Re: FSD  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  11:23:13 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA17533; Fri, 25 Jul 86 11:22:38 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607251822.AA17533@vaxa.isi.edu>
Date: 25 Jul 1986 1122-PDT (Friday)
To: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: FSD
In-Reply-To: Your message of Fri, 25 Jul 1986  14:10 EDT.
             <FAHLMAN.12225546120.BABYL@C.CS.CMU.EDU>


That's my feeling too.  By the way, when you say "versions of common lisp",
just what do you mean?  Are there officially recognized versions?  Or is all
ongoing activity still towards a version 1?

Thanks.

RB

∂25-Jul-86  1254	FAHLMAN@C.CS.CMU.EDU 	FSD
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  12:54:23 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Fri 25 Jul 86 15:54:18-EDT
Date: Fri, 25 Jul 1986  15:54 EDT
Message-ID: <FAHLMAN.12225564981.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λvaxa.isi.edu (Richard Berman)λ
Cc:   cl-validation@SU-AI.ARPA
Subject: FSD
In-reply-to: Msg of 25 Jul 1986  14:22-EDT from berman at vaxa.isi.edu (Richard Berman)


The assumption is that once we have ANSI/ISO approval for one version,
there will be updates to the standard at periodic and not-too-frequent
intervals. 

-- Scott

∂25-Jul-86  1541	berman@vaxa.isi.edu 	Re: FSD  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86  15:40:42 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA20104; Fri, 25 Jul 86 15:39:31 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607252239.AA20104@vaxa.isi.edu>
Date: 25 Jul 1986 1539-PDT (Friday)
To: Fahlman@C.CS.CMU.EDU
Cc: cl-validation@SU-AI.ARPA
Subject: Re: FSD
In-Reply-To: Your message of Fri, 25 Jul 1986  15:54 EDT.
             <FAHLMAN.12225564981.BABYL@C.CS.CMU.EDU>


Thanks, that clears it up for me.

RB

∂26-Jul-86  1447	marick%turkey@gswd-vms.ARPA 	Test suite 
Received: from GSWD-VMS.ARPA by SU-AI.ARPA with TCP; 26 Jul 86  14:47:39 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
	id AA15192; Sat, 26 Jul 86 16:49:06 CDT
Message-Id: <8607262149.AA15192@gswd-vms.ARPA>
Date: Sat, 26 Jul 86 16:49:02 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: berman@vaxa.isi.edu
Cc: cl-validation@su-ai.arpa
In-Reply-To: berman@vaxa.isi.edu's message of 24 Jul 1986 1235-PDT (Thursday)
Subject: Test suite


Equality predicates (mostly digression on test-case syntax):

In any test, you'll have to write down the test case, the expected
results, and the way you test the expected vs. actual results.
The obvious way to do it is

	   (eq (car '(a b c)) 'a)

The way we do it (a way derived from something the DEC people put in
this mailing list a long time ago) is 

	   ( (car '(a b c)) ==> a)

Where the match predicate is implicit (EQUAL).  I like this way better
because it breaks a test down into distinct parts.  That makes it
easier, for example, to print an error message like 
"Test failed with actual result ~A instead of expected result ~A~%".  
If a test is just a lisp form, it will usually look like 
(<match-pred> <test-case> <expected-results>), but "usually" isn't enough.

Once you've got test-forms broken down into separate parts, it just
turns out to be convenient to have one of the parts be the match
function and another to be the type of the test (evaluating,
non-evaluating, error-expecting, etc.)


Compilation:

I wouldn't put off worrying about issues surrounding compilation.
We did just that, and I'm not pleased with the result.  These issues
will affect the whole structure of the test driver, I think, and
ignoring them will, I fear, either lead to throwing away the first
version or living with inadequacy.

∂28-Jul-86  1122	berman@vaxa.isi.edu 	Re: Test suite
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 28 Jul 86  11:21:23 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA07088; Mon, 28 Jul 86 11:19:39 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607281819.AA07088@vaxa.isi.edu>
Date: 28 Jul 1986 1119-PDT (Monday)
To: marick%turkey@gswd-vms.ARPA (Brian Marick)
Cc: cl-validation@su-ai.arpa
Subject: Re: Test suite
In-Reply-To: Your message of Sat, 26 Jul 86 16:49:02 CDT.
             <8607262149.AA15192@gswd-vms.ARPA>


I agree about making the testing predicate a separate part of the test form.
This may become more useful for both analysis and test generation at some
point.

As for compilation -- in the test managers I have received, one generally has
the option of running the tests interpreted, compiled, or both.  There is not
a compile-file option as yet.  I suspect that compile-file should be its own
test, rather than a form of testing.  That is, there will undoubtably be a
mini-suite for testing just compile-file.  As well, there should be a general
sub-suite for testing all forms of compilation.  While it is ad-hoc to test
the compiler by compiling tests not intended to test the compiler, I freely
admit that more subtle bugs are likely to be revealed in this manner for the
very reason that the tests were not intended specifically for compilation.  

Also, there are implementations that only compile, such as ExperLisp.

RB

∂29-Jul-86  1220	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Re: test control
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 29 Jul 86  10:34:18 PDT
Received: from MACH.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 41040; Tue 29-Jul-86 03:32:12-EDT
Date: Tue, 29 Jul 86 03:31 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Re: test control
To: berman@vaxa.isi.edu, cfry@OZ.AI.MIT.EDU
cc: cl-validation@SU-AI.ARPA
In-Reply-To: <8607251749.AA17112@vaxa.isi.edu>
Message-ID: <860729033120.1.CFRY@MACH.AI.MIT.EDU>


    I sort of thought the notion of a "test unit" would communicate the "N" you
    refer to.  Let me be more specific.  N is 1.  But there may be more than one
    form.  N here refers to the number of tests of the function/topic being
    tested.  Other forms can set things up, etc.  If any form fails, it is THAT
    TEST that is reported to have failed, not the entirety of the function/topic.
sounds good.

    I would sure like to hear more ideas, and from others too.  I think now that I
    would modify this testing macro a bit.  I think the "test" proper is in 3
    parts.  A setup, the actual test form, and an un-setup.  Obviously only the
    test form is required.
I usually consider  un-setup to be part of the "setup". 
Say if a test does (setq foo), then the next test is testing 
whether boundp works, As part of the setup I would do (makunbound 'foo).
This means that the current test will not have to rely on everybody else doing
the un-setup properly, which is probably what you have to do anyway.

If all of the unsetups work correctly, then the env should be the same before the
test as it is after, right? This is an awful lot of work your cutting out for yourself.
My proposals in general take into heavy consideration making it easy to write tests,
and making a minimal amount of the system the tests diagnostic controlling program itself
work with just a minimal amount of lisp functioning. It sounds like you're
not operating under the same constraints, but users of the validation suite will be.

    I do somewhat like the idea of just using a lisp-form, and if it is supposed
    to return some result, just ensure it returns non-nil for "OK".  That is,
    using your simpler (pred x y) where pred tests the result, x is the test form,
    and y is the desired result.  I still would like to formalize it somewhat into
    something that more clearly shows which is the test form and the required
    result, as well as the predicate.  See some of the test classes that Marick
    describes.  Not all of them care for a result, and I would like that to be
    more explicit from the layout of the test text.
Ok, I recognize that its nice to be able to find out the various parts of the test,
rather than just have this amorphous lisp exporession that's suppose to return non-nil.
Here's a modified approach that I think will satisfy both of us.
A cheap tester can just evaluate the test and expect to get non-nil.
Most forms will be of the type (pred expression expected-value).
That's pretty simple to parse for error messages and such.
For the don't-care-about-value case, have a function called:
ignore-value.
(defun ignore-value (arg)
  (eval arg)
  t)

If you really need to get explicit, have a function called:
make-test
A call looks like: 
  (make-test pred exp expected-value 
     &key test-id author site-name set-upform un-setup-form error-message compilep ...)

make-test is not quite the right word, because I think evaling it would
perform the test, not just create it. Maybe we should call it
perform-test instead.
If you realy want to give atest a name, there could be a fn
def-test whose args are the same as make-test expect that inserted at
the front is a NAME.

 Anyway the idea is that some hairy database program 
can easily go into the call and extract out all the relevent info.
[actually, its not even so hairy:
  -setup and unsetup default to NIL.
  -if non-list, pred defaults to EQUAL, expected-value defaults to non-nil
  -if list, whose car is not DEF-TEST, pred is car, 
    exp is cadr and expected-value is caddr.
  -if list whose car is DEF-TEST, parse as is obvious.]
  
But some simple program can just run it and it'll do mostly what you want.
the &key args can have appropriate defaults like *site-name* and
*test-author-name*.

My point here is lets use the lisp reader and evaluator, not construct 
a whole new language with its own syntax with "==>" infix operators, 
special names for predicates that duplicate existing cl fns, and such.
Lisp is hip! That's why we're bothering to implement it in the first place!

As for explicit error mesages, using:
"The form ~s evaled to ~s but the expected value was ~s."
Seems pretty complete to me. Nothing in my new proposal makes it hard to
implement such an error message.

    I am sorry you feel I am being evasive.  I could just make arbitrary
    decisions, but in fact I am relaying all the information, ideas and activities
    as they actually are.
Thanks for your concern. Actually I didn't think you were trying to be evasive,
its just that you didn't think that designing the syntax can often simplify
homing in on the exact functionality of the program.

.....
I haven't thought very hard about being able to use the
same test for both compiling and evaling the expression in question.
I agree with whoever said that this should be worked out.
In my above make-test call, I have a var for compilep.
This could take the values T, NIL, or :BOTH, and maybe even
default to :BOTH. 

∂29-Jul-86  1629	berman@vaxa.isi.edu 	Add to list   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 29 Jul 86  11:11:41 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA17440; Tue, 29 Jul 86 11:11:33 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607291811.AA17440@vaxa.isi.edu>
Date: 29 Jul 1986 1111-PDT (Tuesday)
To: CL-Validation@SU-AI.ARPA
Cc: Cornish%bravo@ti-csl@CSNET-RELAY.ARPA
Subject: Add to list


I am forwarding the message here I received to the correct person.
RB

------- Forwarded Message

Return-Path: <CORNISH%Bravo%ti-csl.csnet@CSNET-RELAY.ARPA>
Received: from CSNET-RELAY.ARPA (csnet-pdn-gw.arpa) by vaxa.isi.edu (4.12/4.7)
	id AA11007; Mon, 28 Jul 86 17:09:37 pdt
Received: from ti-csl by csnet-relay.csnet id ar02252; 28 Jul 86 19:56 EDT
Received: from Bravo (bravo.ARPA) by tilde id AA12392; Mon, 28 Jul 86 17:08:11 cdt
To: berman@vaxa.isi.edu
Cc: 
Subject:        CL Validation Mailing List
Date:           28-Jul-86 17:05:11
From: CORNISH%Bravo%ti-csl.csnet@CSNET-RELAY.ARPA
Message-Id:     <CORNISH.2731961109@Bravo>

I would like to be added to the CL Validation Suite mailing list.


------- End of Forwarded Message

∂31-Jul-86  0834	marick%turkey@gswd-vms.ARPA 	Lisp conference 
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 31 Jul 86  08:34:35 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
	id AA00287; Thu, 31 Jul 86 10:34:01 CDT
Message-Id: <8607311534.AA00287@gswd-vms.ARPA>
Date: Thu, 31 Jul 86 10:33:56 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: cl-validation@su-ai.arpa, berman@vaxa.isi.edu
Subject: Lisp conference


Several people interested in CL validation will be at the Lisp
conference.  Perhaps it would be a good idea if Richard Berman were to
buy us all lunch.  Failing that, perhaps we should go to lunch on our
own tab -- or othertimewise get together.

Brian Marick

∂31-Jul-86  1034	berman@vaxa.isi.edu 	Re: Lisp conference
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 31 Jul 86  10:34:39 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA05383; Thu, 31 Jul 86 10:33:44 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607311733.AA05383@vaxa.isi.edu>
Date: 31 Jul 1986 1033-PDT (Thursday)
To: marick%turkey@gswd-vms.ARPA (Brian Marick)
Cc: cl-validation@su-ai.arpa, berman@vaxa.isi.edu
Subject: Re: Lisp conference
In-Reply-To: Your message of Thu, 31 Jul 86 10:33:56 CDT.
             <8607311534.AA00287@gswd-vms.ARPA>

As for Richard Berman buying Lunch - I don't know how ISI would feel about
that, but I'll check.  I am trying to prune my stay to one day, so which
should it be.  I really need to know by today if possible, or friday morning
at worst.  Based on the responses of those interested in the validation
effort, I will decide how long (and which day(s)) to stay.  

So when would y'all like to get together?

Best,

RB

∂01-Aug-86  1348	berman@vaxa.isi.edu 	Conference    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 1 Aug 86  13:48:06 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA15784; Fri, 1 Aug 86 13:47:46 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608012047.AA15784@vaxa.isi.edu>
Date:  1 Aug 1986 1347-PDT (Friday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: Conference


Hey gang, I'm going to be at the conference to meet with any and all parties
interested in the Validation effort.  I may only be around on Monday (but
Tuesday is a possibility) and I would like to meet for lunch after the morning
session.  I assume I'll be wearing some kind of ID badge to identify myself as
Richard Berman from ISI.

I'll bring along a few hardcopies of the ISI proposal outlining our intended
support activities.

I really would like to meet everyone who is working on testing implementations
and other issues like this.

See ya.

RB

∂11-Aug-86  1122	berman@vaxa.isi.edu 	Thanks   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 11 Aug 86  11:22:52 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02567; Mon, 11 Aug 86 11:23:02 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608111823.AA02567@vaxa.isi.edu>
Date: 11 Aug 1986 1122-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Thanks


Thanks to the folks I spoke with at the conference.  The main thing I got from
this is the concept of an ordering macro to facilitate test groups which must
execute in a specific sequence.  

I would like to know if there is any more commentary, questions, suggestions,
etc. regarding the test macro?

RB

∂13-Aug-86  1130	berman@vaxa.isi.edu 	Test Control  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 13 Aug 86  11:29:52 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA10629; Wed, 13 Aug 86 11:30:10 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608131830.AA10629@vaxa.isi.edu>
Date: 13 Aug 1986 1130-PDT (Wednesday)
To: cl-validation@su-ai.arpa
Cc: 
Subject: Test Control


On 29 July Fry proposed a control scheme including a "compilep" option which
would be T, Nil or :BOTH, possibly defaulting to :BOTH.  This would be present
for each test.

I feel that this is unnecessary because Common Lisp is supposed to yield the
same results compiled or interpreted.  At least, that is my understanding.  Is
there any intentional instances where this is not true?

Each test (or ordered series of tests) should be runnable in either form, so I
believe the control for testing compilation should be more global.

What do you think?

RB

∂19-Aug-86  0039	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Test Control    
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 19 Aug 86  00:39:39 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 43318; Tue 19-Aug-86 03:41:06-EDT
Date: Tue, 19 Aug 86 03:42 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Test Control
To: berman@vaxa.isi.edu, cl-validation@SU-AI.ARPA
In-Reply-To: <8608131830.AA10629@vaxa.isi.edu>
Message-ID: <860819034224.7.CFRY@JONES.AI.MIT.EDU>



    On 29 July Fry proposed a control scheme including a "compilep" option which
    would be T, Nil or :BOTH, possibly defaulting to :BOTH.  This would be present
    for each test.

    I feel that this is unnecessary because Common Lisp is supposed to yield the
    same results compiled or interpreted.  At least, that is my understanding.  Is
    there any intentional instances where this is not true?
Well, modulo some recent debate, macro-expand time is different.
Effectively, macro-expand time for compiled functions is the same as definition time.
But for evaled fns, macro-expand time is the same as run time.

But basically you're right. So long as we can easily run a whole set of tests
either evaled, compiled, or both, we don't need to indicate that in each test.
The error messages should definitely say wheather the call failed in compiled or
evaled mode.

    Each test (or ordered series of tests) should be runnable in either form, so I
    believe the control for testing compilation should be more global.

    What do you think?

    RB
In my diagnostic system, I'd like to have the local control.
One reason is so that I can explicitely label a test that has a bug in it.
[and maybe only the compiled version of a call would have the bug.]

If there was a convienient syntax for declaring a test
evaled, compiled, both, or under global control [with global control being the default,
and with BOTH being the global-control's default]
then I'd make use of it.

∂19-Aug-86  1135	berman@vaxa.isi.edu 	Re: Test Control   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 19 Aug 86  11:35:02 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA12279; Tue, 19 Aug 86 11:35:26 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608191835.AA12279@vaxa.isi.edu>
Date: 19 Aug 1986 1135-PDT (Tuesday)
To: CL-Validation@su-ai.arpa
Subject: Re: Test Control
In-Reply-To: Your message of Tue, 19 Aug 86 03:42 EDT.
             <860819034224.7.CFRY@JONES.AI.MIT.EDU>


Re: Fry's idea of having a flag for GLOBAL/LOCAL control, with LOCAL allowing
specification of compiled, evaled or both (for testing), I like it.

I suggest that the flag be a keyword called :CONTROL with the values :GLOBAL,
:COMPILE or :EVAL, where :GLOBAL means that the global test controller will
decide whether the test is compiled and/or evald, and the other two values are
a "compile only" or "eval only" specifier, overriding the global control.  I
don't think that :BOTH is necessary as this seems to be identical to :GLOBAL,
meaning that the test may be compiled and/or evaled.

NOTE:  I am experimenting with a macro now that includes all the best features
we have seemed to agree upon.  I am including the above feature, but naturally
it can be changed.  In a few days I will post this preliminary macro.  It is
not really a control macro, but simply defines the test in terms of the data
base.  Currently I am using generic common-lisp for this organizing macro, and
I am not using FSD.  Instead it creates a simpler database using lists, arrays
and property lists.  This database is for testing only and the actual
organizing macro may stray from pure CL because it is intended for internal
use only.  Of course, the files generated from the database will contain only
"pure" CL for testing purposes.

RB

∂20-Aug-86  0604	hpfclp!hpfcjrd!diamant@hplabs.HP.COM 	Re: Test Control 
Received: from HPLABS.HP.COM by SAIL.STANFORD.EDU with TCP; 20 Aug 86  06:03:39 PDT
Received: by hplabs.HP.COM ; Wed, 20 Aug 86 04:43:35 pdt
From: John Diamant <hpfclp!hpfcjrd!diamant@hplabs.HP.COM>
Received: from hpfcjrd.UUCP; Tue, 19 Aug 86 13:26:12
Received: by hpfcjrd; Tue, 19 Aug 86 13:26:12 mdt
Date: Tue, 19 Aug 86 13:26:12 mdt
To: cl-validation@sail.stanford.edu
Subject: Re: Test Control

> Subject: Test Control
> From: Christopher Fry <hplabs!cfry@OZ.AI.MIT.EDU>
> 
> Well, modulo some recent debate, macro-expand time is different.
> Effectively, macro-expand time for compiled functions is the same as definition time.
> But for evaled fns, macro-expand time is the same as run time.

For evaled functions, it is unspecified in Common Lisp.  This has been
discussed at great length on the CL mailing list, so I won't repeat it here,
but this is a potential source for problems in test runs.  If an implementation
chooses to handle macro expansion the way you suggest (most do), then the
semantics truly are different.  On our implementation, where we chose to
have consistent interpreter and compiler semantics with regard to
macroexpansion, any problems we encountered with expansion time were the same
whether we ran interpreted or compiled.


John Diamant
Systems Software Operation	UUCP:  {ihnp4!hpfcla,hplabs}!hpfclp!diamant
Hewlett Packard Co.		ARPA/CSNET: diamant%hpfclp@hplabs.HP.COM
Fort Collins, CO

∂21-Aug-86  1352	berman@vaxa.isi.edu 	Purpose of Test Suite   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 21 Aug 86  13:52:46 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA29758; Thu, 21 Aug 86 13:53:10 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608212053.AA29758@vaxa.isi.edu>
Date: 21 Aug 1986 1353-PDT (Thursday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Purpose of Test Suite


Now that I have been experimenting a bit, I have come up against a question
that is a bit difficult to decide upon.  From my understanding, I am putting
together a VALIDATION suite, the purpose of which is to determine the presence
operating status of all the CL functions, variables, features, etc.

Is it also supposed to thoroughly test these things?

That is, is this same suite responsible for determining such things as correct
operation at boundary conditions?  How about esoteric interactions?  

In the test of the "+" operation, what would you include?   Obviously you want
to be sure that it works for each data type (and combination of data types)
that it is defined for.  Also you want to make sure that positive/negative is
handled, etc.  Beyond that, should it also check to see if, for example,
MOST-POSITIVE-FIXNUM + 1 causes an error?  How about (+ 1 (1-
MOST-POSITIVE-FIXNUM)) causes no error?  And so on for each of the number-type
boundaries.

RB

∂21-Aug-86  1738	FAHLMAN@C.CS.CMU.EDU 	Purpose of Test Suite  
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 21 Aug 86  17:38:45 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Thu 21 Aug 86 20:37:13-EDT
Date: Thu, 21 Aug 1986  20:37 EDT
Message-ID: <FAHLMAN.12232694383.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To:   berman@λvaxa.isi.edu (Richard Berman)λ
Cc:   CL-Validation@SU-AI.ARPA
Subject: Purpose of Test Suite
In-reply-to: Msg of 21 Aug 1986  16:53-EDT from berman at vaxa.isi.edu (Richard Berman)


I agree that this is supposed to be a validation suite, and not a
comprehensive debugging suite.  It should test that everything is there,
that it basically all works, and should especially stress those things
that might be the subject of misunderstandings.  It is necessary to test
whether you can add a flonum to a bignum; it is not necessary to
test a few thousand pairs of random integers to make sure that the +
operator works for all of them.

-- Scott

∂22-Aug-86  0124	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Purpose of Test Suite
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 22 Aug 86  01:24:29 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 43711; Fri 22-Aug-86 01:50:01-EDT
Date: Fri, 22 Aug 86 01:49 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Purpose of Test Suite
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608212053.AA29758@vaxa.isi.edu>
Message-ID: <860822014957.9.CFRY@JONES.AI.MIT.EDU>


    Now that I have been experimenting a bit, I have come up against a question
    that is a bit difficult to decide upon.  From my understanding, I am putting
    together a VALIDATION suite, the purpose of which is to determine the presence
    operating status of all the CL functions, variables, features, etc.

    Is it also supposed to thoroughly test these things?
If there's much of a difference, we're in big trouble.
If somebody's implementation supports adding of all
integers except (+ 27491 -31200001), we can't be expected to find that out with the
validation suite.

    That is, is this same suite responsible for determining such things as correct
    operation at boundary conditions?  How about esoteric interactions?  

    In the test of the "+" operation, what would you include?   Obviously you want
    to be sure that it works for each data type (and combination of data types)
    that it is defined for.  Also you want to make sure that positive/negative is
    handled, etc.  Beyond that, should it also check to see if, for example,
    MOST-POSITIVE-FIXNUM + 1 causes an error?  How about (+ 1 (1-
    MOST-POSITIVE-FIXNUM)) causes no error?  And so on for each of the number-type
    boundaries.
I think the broader question you're asking is:
Should the validation suite simply test that things work the way they're supposed to
when they're suppose to, or should it also make sure that things DON'T WORK when they're
not suppose to work.
You can obviously expand either catagory to available memory.
For + on non-negative integers, I'd test:
(+)
(+ 0)
(+ 0 0)
(+ 2 3 4 5 6 7)
(+ nil) => should error
(+ "one") => another error case wouldn't hurt
Checking the cases using most-positive-fixnum is 
a good idea and does appear to be necessary.
It's a lot of nit-picking work, though.
I'm glad I'm in MY sandals.

∂22-Aug-86  0125	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	Re: Test Control
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 22 Aug 86  01:24:29 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 43710; Fri 22-Aug-86 01:39:48-EDT
Date: Fri, 22 Aug 86 01:39 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Re: Test Control
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608191835.AA12279@vaxa.isi.edu>
Message-ID: <860822013939.8.CFRY@JONES.AI.MIT.EDU>

    Received: from MC.LCS.MIT.EDU by OZ.AI.MIT.EDU via Chaosnet; 19 Aug 86 14:51-EDT
    Received: from SAIL.STANFORD.EDU by MC.LCS.MIT.EDU 19 Aug 86 14:48:11 EDT
    Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 19 Aug 86  11:35:02 PDT
    Received: by vaxa.isi.edu (4.12/4.7)
	    id AA12279; Tue, 19 Aug 86 11:35:26 pdt
    From: berman@vaxa.isi.edu (Richard Berman)
    Message-Id: <8608191835.AA12279@vaxa.isi.edu>
    Date: 19 Aug 1986 1135-PDT (Tuesday)
    To: CL-Validation@su-ai.arpa
    Subject: Re: Test Control
    In-Reply-To: Your message of Tue, 19 Aug 86 03:42 EDT.
		 <860819034224.7.CFRY@JONES.AI.MIT.EDU>


    Re: Fry's idea of having a flag for GLOBAL/LOCAL control, with LOCAL allowing
    specification of compiled, evaled or both (for testing), I like it.

    I suggest that the flag be a keyword called :CONTROL with the values :GLOBAL,
    :COMPILE or :EVAL, where :GLOBAL means that the global test controller will
    decide whether the test is compiled and/or evald, and the other two values are
    a "compile only" or "eval only" specifier, overriding the global control.
Almost right.
     I
    don't think that :BOTH is necessary as this seems to be identical to :GLOBAL,
    meaning that the test may be compiled and/or evaled.
Nope. :GLOBAL should mean, get the kind of testing from the global variable
     *global-test-kind* which make take on the values:
     :eval, :compile, or :both.
The question is, should the local version be able to say :compile when the global
version says :eval and visa-versa?
Maybe in that case, that test would simply not get run.
[Say, something that only works compiled, and you're running all the tests
knowing that the compiler is completely broken, so don't run any compioed tests.]
Maybe GLOBAL should have precidence?

I know you say everything should work under compiled and evaled and for
strickly VALIDATION purposes, you shouldn't need any of this.
But it would be useful if the same format for validation was
useful for code development.  For one thing, it would simply get used
more and we'd get more validation tests.
For another, it would help developers.

    NOTE:  I am experimenting with a macro now that includes all the best features
    we have seemed to agree upon.  I am including the above feature, but naturally
    it can be changed.  In a few days I will post this preliminary macro.  It is
    not really a control macro, but simply defines the test in terms of the data
    base.  Currently I am using generic common-lisp for this organizing macro, and
    I am not using FSD. 
Right on!
    Instead it creates a simpler database using lists, arrays
    and property lists.  This database is for testing only and the actual
    organizing macro may stray from pure CL because it is intended for internal
    use only.  Of course, the files generated from the database will contain only
    "pure" CL for testing purposes.
sounds good.


∂22-Aug-86  1054	berman@vaxa.isi.edu 	Re: Test Control   
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 22 Aug 86  10:54:33 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA06074; Fri, 22 Aug 86 10:54:47 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608221754.AA06074@vaxa.isi.edu>
Date: 22 Aug 1986 1054-PDT (Friday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: CL-Validation@SU-AI.ARPA
Subject: Re: Test Control
In-Reply-To: Your message of Fri, 22 Aug 86 01:39 EDT.
             <860822013939.8.CFRY@JONES.AI.MIT.EDU>


I am still not sure why :BOTH is needed.  I beleive that the purpose here is
to have individual tests be able to specify a limitation on how they may be
run.  Obviously the vast majority of tests can be run either :COMPILEd or
:EVALed.  It is only the rare test that must limit this with the inclusion of
a :EVAL or :COMPILE option.  I recommend changing these names to :EVAL-ONLY
and :COMPILE-ONLY to clarify the meanings.

The test controller could be told to run every test compile, evaled, or both.
Perhaps it would be useful to also say "run only the EVAL-ONLY tests", etc.
Does this seem useful? If not, please clarify for me just how :BOTH is
different from the union of :EVAL and :COMPILE.

Thanks

RB

∂24-Aug-86  1940	marick%turkey@gswd-vms.ARPA 	Purpose of Test Suite
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 24 Aug 86  19:39:54 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
	id AA10093; Sun, 24 Aug 86 21:39:51 CDT
Message-Id: <8608250239.AA10093@gswd-vms.ARPA>
Date: Sun, 24 Aug 86 21:40:22 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: berman@vaxa.isi.edu, cl-validation@su-ai.arpa
Subject: Purpose of Test Suite


The validation suite should check that a Common Lisp system adheres to
the letter of the specification.  I don't see that that's particularly
different from any test suite.

Of course, you quickly run into combinatorial explosion, so you have to
narrow your scope.  Checking boundary conditions is known to be an
awfully productive way of testing, both because programmers often make
errors around boundaries and also because boundary condition tests can
be written quickly, without much thought.


Once the next version of the CL definition is available, it might be
useful to use it to drive the test suite.  I could see something like
this:

Each "unit" of specification would contain a pointer to the appropriate
test.  For example, the specification for #'+ will say that it takes 0
or more arguments.  That sentence will point to a test that gives #'+ 
0 arguments and Lambda-Parameters-Limit arguments (the boundary
conditions).  The FSD database ought to be able to support this.

It might also be useful to have a list of stock values to use for
testing.  Each datatype contains classes of "equivalent values", and
these stock values would be the boundary values.  For example, the stock
values for type fixnum would be most-negative-fixnum, -1, 0, +1, and
most-positive-fixnum.  In some string tests I whipped off not too long
back, I used three stock strings: a simple-string, a string with one
level of displacement, a string with two levels of displacement,
including a displacement offset and a fill-pointer. (Guess what I was
testing.)  These stock values have the advantage that they eliminate
some of the thinking required per test.  The disadvantage is that they
institutionalize gaps in your test coverage.

I don't know that this is practical at this late date.

Brian Marick




∂25-Aug-86  1221	berman@vaxa.isi.edu 	TEST MACRO    
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Aug 86  12:20:45 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02269; Mon, 25 Aug 86 12:21:34 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608251921.AA02269@vaxa.isi.edu>
Date: 25 Aug 1986 1221-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: TEST MACRO


Here is the current version of the test macro stuff.  Note that this is an
organizing macro, to create the database.  The variable LIST-OF-ITEMS is not
defined here - it contains a listing of all the CL function, macro, variable
names, etc.

I am not 100% happy with the current version, and I look forward to your
suggestions.  Remember, this creates a data base.  The main requisite is that
this macro must embody all the necessary info for the management and running
of the tests.  My next message will contain some samples.



;; -*-  Mode:Common-Lisp; Base: 10; Package:cl-tests  -*-

(in-package 'cl-tests)

(defvar *list-of-test-names* nil)

(defvar *list-of-test-seq-names* nil)

; ADD-TEST does the work of putting the test into the database.
; It doesn not do any testing.
; NOTE: This version is for testing.  It doesn not use FSD,
; but should work in any Common lisp.  See DEFTEST for
; descriptions of the arguments.

(defmacro add-test (item name type contrib$ setup testform unsetup
		    failform error$ doc$ name-add control)
  (putprop name item 'test-of) 	; note what it is a test of.
  (putprop name type 'test-type)
  (putprop name contrib$ 'test-contributor)
  (putprop name setup 'test-setup)
  (putprop name testform 'test-form)
  (putprop name unsetup 'test-unsetup)
  (putprop name failform 'test-failform)
  (putprop name error$ 'test-error$)
  (putprop name doc$ 'test-doc$)
  (putprop name control 'test-control)
  (and name-add
       (putprop item (cons name (get  item 'tests)) 'tests)
       (push name *list-of-test-names*))
  `',name)


; DEFTEST is used to define a test.  It puts the test into a database.
; The arguments are:

; ITEM which is one of the common lisp function names, variables, macro names, 
;      etc. or a subject name.  The name must be present in the organizing
;      database.

; NAME must be a unique symbol for this test.

; TYPE is optional, defaulting to ORDINARY.  It must be one of NOEVAL,
;      EVAL or ERROR.  ORDINARY means the testform eval section is 
;      evaluated and compared (using the indicated compare in the testform)
;      with the unevaluated compare section.  EVAL means both halves 
;      are evaluated and compared.  ERROR means the form should produce
;      an error.

; TESTFORM is the test form, composed of 1 or 3 parts.  If this is 
;      and ERROR test, TESTFORM is an expresion which must produce
;      an error.  Otherwise there are 3 parts.  The first is the 
;      eval form, which is evaluated.  The second is a form which
;      can be used as a function by APPLY, taking two arguments and
;      used to compare the results of the eval form with the third
;      part of the TESTFORM, the compare form.  The compare form is
;      either evalutated (type EVAL) or not (type NOEVAL).

; The remaining arguments are optional, referenced by keywords.  They are:
  
; :CONTRIB$ is a documentation string showing the originator of the test.
; If unspecified or NIL it gets its value from CL-TESTS:*CONTRIB$*

; :FAILFORM is a form to evaluate in the event that an unexpected error
;      was generated, or the comparison failed.

; :ERROR$ is a string to print out if the comparison fails.

; :SETUP is a form to evaluate before TESTFORM.

; :UNSETUP is a form to evaluate after TESTFORM.

; :DOC$ is a string documenting this test.  If not specified (or nil) it
; gets it value from the global variable CL-TESTS:*DOC$*

; :CONTROL may be any of :GLOBAL, :EVAL or :COMPILE.  If it is :GLOBAL,
;     it means that the test controller will decide when/if to eval and
;     compile the test.  If it is :EVAL, then the test will ignore 
;     controller attempts to compile it, and if it is :COMPILE the
;     controller cannot eval it.  The default is :GLOBAL.

(defvar *CONTRIB$* nil)
(defvar *DOC$* nil)

(defmacro DEFTEST ((item name &optional (type 'noeval)) testform
		   &key (contrib$ *contrib$*) (failform nil) (error$ nil) (setup nil)
		   (unsetup nil) (doc$ nil) (name-add t)(control :GLOBAL))
  (cond ((null(memq item list-of-items))
	 (error "'~s' is not a CL item or subject.~%" item))
	((null(memq type '(noeval eval error)))
	 (error "The test-type ~s is not one of NOEVAL, EVAL or ERROR."))
	((null(stringp contrib$))
	 (error "The contributor, ~s, must be a string." contrib$))
	((null(or (null error$) (stringp error$)))
	 (error ":ERROR$ must be a string."))
	((null (or (null doc$) (stringp doc$)))
	 (error ":DOC$ must be a string."))
	((null (memq control '(:GLOBAL :EVAL :COMPILE)))
	 (error ":CONTROL must be one of :GLOBAL, :EVAL or :COMPILE."))
	((memq name *list-of-test-names*)
	 (error "The test name ~s has already been used!" name)))
  `(add-test ,item ,name ,type ,*contrib$*
	     ,setup ,testform ,unsetup  ,failform ,error$
	     ,(or doc$ *doc$*) ,name-add ,control))  ; put it on the item.


; The format for test sequences is:

; (DEFTEST-SEQ (item seq-name)
;              (((test-name <type>) testform <key-word data>)
;               ((test-name <type>) testform <key-word data>) ... )
;              :CONTRIB$ <contributor-string>
;              :SETUP <setup form>
;              :UNSETUP <unsetup form>
;              :DOC$ <documentation string>

(defmacro add-test-seq (item seq-name test-names contrib$ setup unsetup doc$)
  (putprop seq-name item 'test-seq-of)
  (putprop seq-name contrib$ 'test-seq-contributor)
  (putprop seq-name setup 'test-seq-setup)
  (putprop seq-name test-names 'test-seq-names)
  (putprop seq-name unsetup 'test-seq-unsetup)
  (putprop seq-name doc$ 'test-seq-doc$)
  (putprop item (nconc (get item 'test-seqs) (list seq-name)) 'test-seqs)
  (push seq-name *list-of-test-seq-names*)
  `',seq-name)

(defmacro add-1-seq (item a-test contrib$)
  `(deftest (,item ,@ (car a-test))
	    ,(second  a-test)
	    :contrib$ , contrib$
	    ,@ (cddr a-test)
	    :name-add nil))


(defmacro DEFTEST-SEQ ((item seq-name) test-seq
		       &key (contrib$ *contrib$*) (setup nil) (unsetup nil) (doc$ *doc$*))
  (cond ((null(memq item list-of-items))
	 (error "'~s' is not a CL item or subject.~%" item))
	((null(stringp contrib$))
	 (error "The contributor must be a string."))
	((null (or (null doc$) (stringp doc$)))
	 (error ":DOC$ must be a string."))
	((memq seq-name *list-of-test-seq-names*)
	 (error "The test-sequence name ~s has already been used!" seq-name)))
  (let (test-names)
    (dolist (a-test test-seq)
      (setq test-names
	    (nconc test-names
		   (list (eval `(add-1-seq ,item ,a-test ,contrib$))))))
    `(add-test-seq ,item
		   ,seq-name
		   ,test-names
		   ,contrib$
		   ,setup
		   ,unsetup
		   ,doc$)))

∂25-Aug-86  1225	berman@vaxa.isi.edu 	Test-Macro examples
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Aug 86  12:24:41 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02290; Mon, 25 Aug 86 12:25:39 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608251925.AA02290@vaxa.isi.edu>
Date: 25 Aug 1986 1225-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Test-Macro examples


Here are some samples.  They are transliterated from the CDC test suite, so
please, no flames over content.

;; -*-  Mode:Common-Lisp; Base: 10; Package:cl-tests  -*-

(in-package 'cl-tests)

;*******************************************************************

;; ACONS test.

(setq *contrib$* "CDC.  Test case written by Richard Hufford.")
(setq *doc$* nil)

(deftest
  (acons acons-1)
  ((acons 'frog 'amphibian nil) equal (frog . amphibian))
  :doc$ "ACONS to NIL")
  
(deftest
  (acons acons-2)
  ((acons 'frog
	  'amphibian
	  '((duck . bird)(goose . bird)(dog . mammal)))
   equal
   (frog . amphibian)(duck . bird)(goose . bird)(dog . mammal))
  :doc$ "acons to a-list")

(deftest
  (acons acons-3)
  ((acons 'frog nil nil) equal ((frog)))
  :doc$ "acons nil datum")

(deftest
  (acons acons-4)
  ((acons 'frog
	  '(amphibian warts webbed-feet says-ribbet)
	  nil)
   equal
   ((frog . (amphibian warts webbed-feet says-ribbet))))
  :doc "acons with list datum")

;*******************************************************************

;; ACOSH test.

(deftest-seq
  (acosh cdc-acosh-tests)
  (((acosh-1)
    ((ACOSH  1.0000) ACOSH-P   0.0000))
   ((acosh-2)
    ((ACOSH  1.0345) ACOSH-P   0.26193))
   ((acosh-3)
    ((ACOSH  1.1402) ACOSH-P   0.5235))
   ((acosh-4)
    ((ACOSH  1.3246) ACOSH-P   0.7854))
   ((acosh-5)
    ((ACOSH  1.6003) ACOSH-P   1.0472))
   ((acosh-6)
    ((ACOSH  1.9863) ACOSH-P   1.3090))
   ((acosh-7)
    ((ACOSH  2.5092) ACOSH-P   1.5708))
   ((acosh-8)
    ((ACOSH  3.2051) ACOSH-P   1.8326))
   ((acosh-9)
    ((ACOSH  4.1219) ACOSH-P   2.0944))
   ((acosh-10)
    ((ACOSH  5.3228) ACOSH-P   2.3562))
   ((acosh-11)
    ((ACOSH  6.8906) ACOSH-P   2.6180))
   ((acosh-12)
    ((ACOSH  8.9334) ACOSH-P   2.8798))
   ((acosh-13)
    ((ACOSH 11.5920) ACOSH-P   3.1416))
   ((acosh-14)
    ((ACOSH 15.0497) ACOSH-P   3.4034))
   ((acosh-15)
    ((ACOSH 19.5448) ACOSH-P   3.6652))
   ((acosh-16)
    ((ACOSH 25.3871) ACOSH-P   3.9270))
   ((acosh-17)
    ((ACOSH 32.9794) ACOSH-P   4.1888))
   ((acosh-18)
    ((ACOSH 42.8450) ACOSH-P   4.4506))
   ((acosh-19)
    ((ACOSH 55.6640) ACOSH-P   4.7124))
   ((acosh-20)
    ((ACOSH 72.3200) ACOSH-P   4.9742))
   ((acosh-21)
    ((ACOSH 93.9611) ACOSH-P   5.2360)))
  :setup (DEFUN ACOSH-P (ARG1 ARG2)
	   (PROG (RES)
		 (COND ((= ARG1 ARG2) (RETURN T))
		       ((= ARG2 0.0) (RETURN (AND (> ARG1 -1E-9)
						  (< ARG1 1E-9))))
		       (T (SETQ RES (/ ARG1 ARG2))
			  (RETURN (AND (> RES 0.9999)
				       (< RES 1.0001)))))))
  :unsetup (fmakunbound 'acosh-p)
  :contrib$ "CDC.  Test case written by BRANDON CROSS, SOFTWARE ARCHITECTURE AND ENGINEERING"
  :doc$ nil)

∂25-Aug-86  1255	berman@vaxa.isi.edu 	Purpose  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Aug 86  12:55:26 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA02551; Mon, 25 Aug 86 12:56:25 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608251956.AA02551@vaxa.isi.edu>
Date: 25 Aug 1986 1256-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: Purpose


>From Fahlman I get that the purpose is basically to see that the spec (or
whatever) is checked, rather than a sweepingly deep test.  Marick seems to
feel that checking "that a Common Lisp system adheres to the letter of the
specification" is not "particularly different from any test suite".  Yet it
seems that a vendor's test suite (and I have reviewed about 6 major ones now)
is more designed towards the testing of both adherance to spec and specific
areas of interest/problems in that implementation.

Marick's comments re "stock values" seems somewhat useful. Certainly adding
zero and -1 is sufficient to test the handling of both zero and -1 for
addition.  I don't then need to add -7 and 2 to test for correct handling of
negatives.  Fahlman basically said that testing the functions for each of the
data types it should handle was important.  I think that this (data type
handling) and boundary conditions pretty much sum up the nature of the
validation suite which therefore should:

    1.  Test for the presence of all Common Lisp pre-defined objects.
    2.  Test for correct definition by:
         a.  Testing for the data type (i.e. Function, Constant, etc.) of each
of these objects.
         b.  Evaluating constants and variables for correct value.
         c.  Applying functions/macros to a sufficienty broad range of
arguments so as to ascertain the functionality for each type of argument and
combination of types.

Also, a few interraction tests are in order.  By this I mean the testing of
more complex forms, and I am thinking specifically of scoping.

Obviously this test suite will not cover in any way extensions made to the
language.  I know that such things as error handling and object oriented
programming are being addressed, but so far these very important areas remain
undetermined.  Should I also make this same data base (and its corresponding
test-file making utilities, etc.) available for this vendor-specific use?  I
don't even know if I CAN do this without some kind of semi-legal hassle
because at present all contributions are public domain.  But it would be nice
to have the same test format for everything.  

As I must use FSD, I cannot easily give away the actual database stuff.  So
far it is all in straight CL, but this is only because FSD is not yet running
on the TI explorer.  This is imminent, but I will try (no promise) to keep
some kind of CL version of the database stuff around.  If it gets too complex
(which is what FSD is good at handling) I may have to cease working on a
straigh CL version.

So.............

Comments????  Is this the correct statement of the purpose and direction I
should use in putting this thing together?

Thanks.
RB

∂27-Aug-86  0041	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	TEST MACRO 
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 27 Aug 86  00:41:02 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 44090; Wed 27-Aug-86 03:43:31-EDT
Date: Wed, 27 Aug 86 03:42 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: TEST MACRO
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608251921.AA02269@vaxa.isi.edu>
Message-ID: <860827034240.5.CFRY@JONES.AI.MIT.EDU>



    (defmacro add-test (item name type contrib$ setup testform unsetup
			failform error$ doc$ name-add control)
      (putprop name item 'test-of) 	; note what it is a test of.
      (putprop name type 'test-type)
      (putprop name contrib$ 'test-contributor)
      (putprop name setup 'test-setup)
      (putprop name testform 'test-form)
      (putprop name unsetup 'test-unsetup)
      (putprop name failform 'test-failform)
      (putprop name error$ 'test-error$)
      (putprop name doc$ 'test-doc$)
Usually doc should default to something
      (putprop name control 'test-control)
      (and name-add
	   (putprop item (cons name (get  item 'tests)) 'tests)
	   (push name *list-of-test-names*))
      `',name)



    ; TESTFORM is the test form, composed of 1 or 3 parts.  If this is 
    ;      and ERROR test, TESTFORM is an expresion which must produce
    ;      an error.  Otherwise there are 3 parts.  The first is the 
    ;      eval form, which is evaluated.  The second is a form which
    ;      can be used as a function by APPLY, taking two arguments and
    ;      used to compare the results of the eval form with the third
    ;      part of the TESTFORM, the compare form.
I prefer lisp syntax. compare form first! Then test form, then expected result.
Make it look like a function call, ie a list of 3 elements.
Infix is good for mathematicians who don't understand elegant syntax.

  The compare form is
    ;      either evalutated (type EVAL) or not (type NOEVAL).
Always evaluate it. Specify no-eval by putting a quote in front of it!
[not necessary in case its a number, string, character, keyword, etc.]


    ; :FAILFORM is a form to evaluate in the event that an unexpected error
    ;      was generated, or the comparison failed.
How about have  the default prints to *error-output* a composed message like:
"In test FROBULATOR, (foo) should have returned 2 but returned 3 instead."

    ; :ERROR$ is a string to print out if the comparison fails.
Do we need both failform and error$ ?
If the test fails, evaluate the value of :failform, which prints out the standard message.
Its rare when you'd want to do something other than the default.
Maybe it would be good to have the default behavior come from
global var *test-fail-action*, so someone could generate their own
format of reporting bugs.

    ; :SETUP is a form to evaluate before TESTFORM.

    ; :UNSETUP is a form to evaluate after TESTFORM.

    ; :DOC$ is a string documenting this test.  If not specified (or nil) it
    ; gets it value from the global variable CL-TESTS:*DOC$*
Which itself defaults to "" .

    ; :CONTROL may be any of :GLOBAL, :EVAL or :COMPILE.  If it is :GLOBAL,
    ;     it means that the test controller will decide when/if to eval and
    ;     compile the test.  If it is :EVAL, then the test will ignore 
    ;     controller attempts to compile it, and if it is :COMPILE the
    ;     controller cannot eval it.  The default is :GLOBAL.
Sounds good. Actually your names of :eval-only and :compile-only are
clearer, but just so long as we all agree upon the semantics.

    ; (DEFTEST-SEQ (item seq-name)

 I'd hope most of the time to never have to see a call to
 deftest-seq. Something should just go over a whole file
 and make it one big call to deftest-seq.
 But its nice to have for obscure cases and non-file modularity.

I notice some dollar sign suffixes in the code.
How about a DIAG package to avoid name conflicts?
Of course, the package system has to be working for you to run your
diagnostics, but ...

∂27-Aug-86  1211	berman@vaxa.isi.edu 	TEST MACRO - Fry's Comments  
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 27 Aug 86  12:11:01 PDT
Received: by vaxa.isi.edu (4.12/4.7)
	id AA18922; Wed, 27 Aug 86 12:11:08 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608271911.AA18922@vaxa.isi.edu>
Date: 27 Aug 1986 1211-PDT (Wednesday)
To: CL-Validation@su-ai.arpa
Cc: 
Subject: TEST MACRO - Fry's Comments


doc$ DOES default to a global value.  

As for TESTFORM -- Lisp syntax is already, with the following proviso: It must
be of the form (predicate arg1 arg2) where predicate is an object which can be
applied to arg1 and arg2.  I.e. (not(eq arg1 arg2)) is no good.  But (neq arg1
arg2) is ok.  The exception (per the comments in my code) is an ERROR type of
test.

"Always evaluate it [the compare form]".  I took my current default
from Marick (that is, the compare form is not evaluated unless you specify
EVAL) after looking over a lot of different companies' test suites.  By FAR
the vast majority of tests were of the NOEVAL variety.  This will almost
certainly stand as the default.

:FAILFORM is very different from ERROR$.  Per my original posting regarding
the macro, FAILFORM is optional (and will be rarely used at this point).  It
is to help analyze an error (or pattern of errors) further.  It is used for
testing beyond the "first order", where "first order" means simple error
testing.  For example, one may wish for a :FAILFORM to maintain a list of
tests that have failed for a later analysis.  :ERROR$ is simply a message to
print out.  Actually, it might be nice if :ERROR$ was a format string with
some kind of argument capability, but this may be dangerous in a testing
environment since FORMAT is such a hairy function.  

I like the idea of a global default *TEST-FAIL-ACTION*.  I would then add an
:IF-FAIL keyword.  This is different from :FAILFORM in that :FAILFORM is sort
of an :AFTER mix-in for the standard test-fail-action (or maybe a :BEFORE???,
any preferance?) rather than a replacement for the standard fail action.
:IF-FAIL would therefore allow one to replace the standard test-fail action,
which :FAILFORM would be the "mix-in" to the fail action.  This is a useful
separation, especially when prototyping tests where :FAILFORM may not change
at the same rate as :IF-FAIL.  I hope this paragraph is clear.  Whew.

Yeah, we'll go to :COMPILE-ONLY and :EVAL-ONLY, with the previously defined
semantics, ok?

As for DEFTEST-SEQ...it is very necessary, and came about as a direct result
of working with existing test suites.  This is used when you have auxiliary
functions, macros, variables, etc., which must exist at the the time the
sequence of tests is run.  It is not always used just for ordering tests.  For
example, in the CDC suite they have a function for comparing two numbers
within a certain tolerance which is used as part of the test for #'+.  All the
tests of #'+ use this as the predicate.  So, all the #'+ tests are wrapped in
a DEFTEST-SEQ with the definition of this predicate in the :SETUP slot.  In
this case, the actual temporal sequence of the tests is unimportant.  Another
use for DEFTEST-SEQ is when the test sequence is itself important.

Don't forget that each test will become an object in a database, and an
extraction routine will build the files which you will then load as a test
suite.  Thus with this paradigm, you MUST associate any auxilary environmental
factors as part of the relevant tests, otherwise there is no way at
file-building time to determine what predicates should be defined where.

As you said, "It's nice to have for...non-file modularity", which is exactly
the case.

As for dollar-sign suffixes -- that's a holdover from BASIC, and is short for
"string".  It isn't an attempt to avoid name conflicts.  HOWEVER...I have been
meaning to stick all this stuff in its own package anyway.

And, yeah, the package system has to be working, but...


Thanks a lot for your comments.  To summarize, the things I agree with:
Prefix syntax for TESTFORM, with the mentioned proviso.  Some kind of global
*TEST-FAIL-ACTION*.  Using the names :EVAL-ONLY and :COMPILE-ONLY.  A Package
for test stuff.  I disargree with: Always evaluating the compare form.  And,
lastly, your comments on :FAILFORM and :ERROR$, and DEFTEST-SEQ may be due to
some misunderstanding of an earlier message.

Sha-Boom.

RB

∂28-Aug-86  1308	@REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU 	TEST MACRO - Fry's Comments    
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 28 Aug 86  13:08:00 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 44202; Thu 28-Aug-86 16:10:56-EDT
Date: Thu, 28 Aug 86 16:09 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: TEST MACRO - Fry's Comments
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608271911.AA18922@vaxa.isi.edu>
Message-ID: <860828160939.2.CFRY@JONES.AI.MIT.EDU>




    Thanks a lot for your comments.  To summarize, the things I agree with:
    Prefix syntax for TESTFORM, with the mentioned proviso.  Some kind of global
    *TEST-FAIL-ACTION*.  Using the names :EVAL-ONLY and :COMPILE-ONLY.  A Package
    for test stuff.
Good.
    I disargree with: Always evaluating the compare form.  And,
    lastly, your comments on :FAILFORM and :ERROR$, and DEFTEST-SEQ may be due to
    some misunderstanding of an earlier message.
I think the real thrust of my arguments was just to try to cut down the number of
keyword args in this test macro, and thus make it easier to remember what's going on.
Always evaling the comparison form cuts out the :eval-compare-form, and
just having one action taken when a test fails cuts out one of
:failform or :error$. You'll be using the test stuff more than anyone so you
will have implimentors myopia disease which is:
"You can remember all this stuff because you work with it daily."
But you also have the insight from being most experienced with the problem
and have the distinct advantage of implementing the code.
Please consider us less-frequent users when you add a new and/or confusing
feature [where confusing means non-lisp like].


Fry